Remotely sensed data are dominated by mixed Land Use and Land Cover (LULC)
types. Spectral unmixing (SU) is a key technique that disentangles mixed pixels
into constituent LULC types and their abundance fractions. While existing
studies on Deep Learning (DL) for SU typically focus on single time-step
hyperspectral (HS) or multispectral (MS) data, our work pioneers SU using MODIS
MS time series, addressing missing data with end-to-end DL models. Our approach
enhances a Long-Short Term Memory (LSTM)-based model by incorporating
geographic, topographic (geo-topographic), and climatic ancillary information.
Notably, our method eliminates the need for explicit endmember extraction,
instead learning the input-output relationship between mixed spectra and LULC
abundances through supervised learning. Experimental results demonstrate that
integrating spectral-temporal input data with geo-topographic and climatic
information significantly improves the estimation of LULC abundances in mixed
pixels. To facilitate this study, we curated a novel labeled dataset for
Andalusia (Spain) with monthly MODIS multispectral time series at 460m
resolution for 2013. Named Andalusia MultiSpectral MultiTemporal Unmixing
(Andalusia-MSMTU), this dataset provides pixel-level annotations of LULC
abundances along with ancillary information. The dataset
(https://zenodo.org/records/7752348) and code
(https://github.com/jrodriguezortega/MSMTU) are available to the public.
( 3
min )
In the field of clinical medicine, computed tomography (CT) is an effective
medical imaging modality for the diagnosis of various pathologies. Compared
with X-ray images, CT images can provide more information, including
multi-planar slices and three-dimensional structures for clinical diagnosis.
However, CT imaging requires patients to be exposed to large doses of ionizing
radiation for a long time, which may cause irreversible physical harm. In this
paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on
generated radiation fields. The network can learn a continuous representation
of CT projections from 2D X-ray images by obtaining the internal structure and
depth information and using adaptive loss weights to ensure the quality of the
generated images. Our model is trained on publicly available knee and chest
datasets, and we show the results of CT projection rendering with a single
X-ray and compare our method with other methods based on generated radiation
fields.
( 2
min )
We propose a decoder-only language model, VoxtLM, that can perform four
tasks: speech recognition, speech synthesis, text generation, and speech
continuation. VoxtLM integrates text vocabulary with discrete speech tokens
from self-supervised speech features and uses special tokens to enable
multitask learning. Compared to a single-task model, VoxtLM exhibits a
significant improvement in speech synthesis, with improvements in both speech
intelligibility from 28.9 to 5.6 and objective quality from 2.68 to 3.90.
VoxtLM also improves speech generation and speech recognition performance over
the single-task counterpart. Further, VoxtLM is trained with publicly available
data and training recipes and model checkpoints are open-sourced to make fully
reproducible work.
( 2
min )
Predicting next visit diagnosis using Electronic Health Records (EHR) is an
essential task in healthcare, critical for devising proactive future plans for
both healthcare providers and patients. Nonetheless, many preceding studies
have not sufficiently addressed the heterogeneous and hierarchical
characteristics inherent in EHR data, inevitably leading to sub-optimal
performance. To this end, we propose NECHO, a novel medical code-centric
multimodal contrastive EHR learning framework with hierarchical regularisation.
First, we integrate multifaceted information encompassing medical codes,
demographics, and clinical notes using a tailored network design and a pair of
bimodal contrastive losses, all of which pivot around a medical code
representation. We also regularise modality-specific encoders using a parental
level information in medical ontology to learn hierarchical structure of EHR
data. A series of experiments on MIMIC-III data demonstrates effectiveness of
our approach.
( 2
min )
With the rise in communication capacity, deep neural networks (DNN) for
digital pre-distortion (DPD) to correct non-linearity in wideband power
amplifiers (PAs) have become prominent. Yet, there is a void in open-source and
measurement-setup-independent platforms for fast DPD exploration and objective
DPD model comparison. This paper presents an open-source framework, OpenDPD,
crafted in PyTorch, with an associated dataset for PA modeling and DPD
learning. We introduce a Dense Gated Recurrent Unit (DGRU)-DPD, trained via a
novel end-to-end learning architecture, outperforming previous DPD models on a
digital PA (DPA) in the new digital transmitter (DTX) architecture with
unconventional transfer characteristics compared to analog PAs. Measurements
show our DGRU-DPD achieves an ACPR of -44.69/-44.47 dBc and an EVM of -35.22 dB
for 200 MHz OFDM signals. OpenDPD code, datasets, and documentation are
publicly available at https://github.com/lab-emi/OpenDPD.
( 2
min )
Self-supervised learning (SSL) has emerged as a promising paradigm for
learning flexible speech representations from unlabeled data. By designing
pretext tasks that exploit statistical regularities, SSL models can capture
useful representations that are transferable to downstream tasks. This study
provides an empirical analysis of Barlow Twins (BT), an SSL technique inspired
by theories of redundancy reduction in human perception. On downstream tasks,
BT representations accelerated learning and transferred across domains.
However, limitations exist in disentangling key explanatory factors, with
redundancy reduction and invariance alone insufficient for factorization of
learned latents into modular, compact, and informative codes. Our ablations
study isolated gains from invariance constraints, but the gains were
context-dependent. Overall, this work substantiates the potential of Barlow
Twins for sample-efficient speech encoding. However, challenges remain in
achieving fully hierarchical representations. The analysis methodology and
insights pave a path for extensions incorporating further inductive priors and
perceptual principles to further enhance the BT self-supervision framework.
( 2
min )
In replay-based methods for continual learning, replaying input samples in
episodic memory has shown its effectiveness in alleviating catastrophic
forgetting. However, the potential key factor of cross-entropy loss with
softmax in causing catastrophic forgetting has been underexplored. In this
paper, we analyze the effect of softmax and revisit softmax masking with
negative infinity to shed light on its ability to mitigate catastrophic
forgetting. Based on the analyses, it is found that negative infinity masked
softmax is not always compatible with dark knowledge. To improve the
compatibility, we propose a general masked softmax that controls the stability
by adjusting the gradient scale to old and new classes. We demonstrate that
utilizing our method on other replay-based methods results in better
performance, primarily by enhancing model stability in continual learning
benchmarks, even when the buffer size is set to an extremely small value.
( 2
min )
Efficiently generating energetically stable crystal structures has long been
a challenge in material design, primarily due to the immense arrangement of
atoms in a crystal lattice. To facilitate the discovery of stable material, we
present a framework for the generation of synthesizable materials, leveraging a
point cloud representation to encode intricate structural information. At the
heart of this framework lies the introduction of a diffusion model as its
foundational pillar. To gauge the efficacy of our approach, we employ it to
reconstruct input structures from our training datasets, rigorously validating
its high reconstruction performance. Furthermore, we demonstrate the profound
potential of Point Cloud-Based Crystal Diffusion (PCCD) by generating entirely
new materials, emphasizing their synthesizability. Our research stands as a
noteworthy contribution to the advancement of materials design and synthesis
through the cutting-edge avenue of generative design instead of the
conventional substitution or experience-based discovery.
( 2
min )
We present a small study analyzing how prompt token classification loss
weighting (PLW) affects the performance of 7B-size LLaMA models fine-tuned on
instruction tasks. We recreated Stanford's Alpaca experiment with both LLaMA 1
and LLaMA 2 using multiple instruction datasets. We found that models
fine-tuned on our short-completion dataset have a negative quadratic
relationship with PLW while models fine-tuned on long-completion datasets were
unaffected by PLW.
( 2
min )
The Surrogate Modeling Toolbox (SMT) is an open-source Python package that
offers a collection of surrogate modeling methods, sampling techniques, and a
set of sample problems. This paper presents SMT 2.0, a major new release of SMT
that introduces significant upgrades and new features to the toolbox. This
release adds the capability to handle mixed-variable surrogate models and
hierarchical variables. These types of variables are becoming increasingly
important in several surrogate modeling applications. SMT 2.0 also improves SMT
by extending sampling methods, adding new surrogate models, and computing
variance and kernel derivatives for Kriging. This release also includes new
functions to handle noisy and use multifidelity data. To the best of our
knowledge, SMT 2.0 is the first open-source surrogate library to propose
surrogate models for hierarchical and mixed inputs. This open-source software
is distributed under the New BSD license.
( 3
min )
Recently, there has been a growing interest for mixed-categorical meta-models
based on Gaussian process (GP) surrogates. In this setting, several existing
approaches use different strategies either by using continuous kernels (e.g.,
continuous relaxation and Gower distance based GP) or by using a direct
estimation of the correlation matrix. In this paper, we present a kernel-based
approach that extends continuous exponential kernels to handle
mixed-categorical variables. The proposed kernel leads to a new GP surrogate
that generalizes both the continuous relaxation and the Gower distance based GP
models. We demonstrate, on both analytical and engineering problems, that our
proposed GP model gives a higher likelihood and a smaller residual error than
the other kernel-based state-of-the-art models. Our method is available in the
open-source software SMT.
( 2
min )
A major challenge in Natural Language Processing is obtaining annotated data
for supervised learning. An option is the use of crowdsourcing platforms for
data annotation. However, crowdsourcing introduces issues related to the
annotator's experience, consistency, and biases. An alternative is to use
zero-shot methods, which in turn have limitations compared to their few-shot or
fully supervised counterparts. Recent advancements driven by large language
models show potential, but struggle to adapt to specialized domains with
severely limited data. The most common approaches therefore involve the human
itself randomly annotating a set of datapoints to build initial datasets. But
randomly sampling data to be annotated is often inefficient as it ignores the
characteristics of the data and the specific needs of the model. The situation
worsens when working with imbalanced datasets, as random sampling tends to
heavily bias towards the majority classes, leading to excessive annotated data.
To address these issues, this paper contributes an automatic and informed data
selection architecture to build a small dataset for few-shot learning. Our
proposal minimizes the quantity and maximizes diversity of data selected for
human annotation, while improving model performance.
( 3
min )
This paper describes $\pi2\text{vec}$, a method for representing behaviors of
black box policies as feature vectors. The policy representations capture how
the statistics of foundation model features change in response to the policy
behavior in a task agnostic way, and can be trained from offline data, allowing
them to be used in offline policy selection. This work provides a key piece of
a recipe for fusing together three modern lines of research: Offline policy
evaluation as a counterpart to offline RL, foundation models as generic and
powerful state representations, and efficient policy selection in resource
constrained environments.
( 2
min )
In recent years, various powerful policy gradient algorithms have been
proposed in deep reinforcement learning. While all these algorithms build on
the Policy Gradient Theorem, the specific design choices differ significantly
across algorithms. We provide a holistic overview of on-policy policy gradient
algorithms to facilitate the understanding of both their theoretical
foundations and their practical implementations. In this overview, we include a
detailed proof of the continuous version of the Policy Gradient Theorem,
convergence results and a comprehensive discussion of practical algorithms. We
compare the most prominent algorithms on continuous control environments and
provide insights on the benefits of regularization. All code is available at
https://github.com/Matt00n/PolicyGradientsJax.
( 2
min )
Synthesizing performing guitar sound is a highly challenging task due to the
polyphony and high variability in expression. Recently, deep generative models
have shown promising results in synthesizing expressive polyphonic instrument
sounds from music scores, often using a generic MIDI input. In this work, we
propose an expressive acoustic guitar sound synthesis model with a customized
input representation to the instrument, which we call guitarroll. We implement
the proposed approach using diffusion-based outpainting which can generate
audio with long-term consistency. To overcome the lack of MIDI/audio-paired
datasets, we used not only an existing guitar dataset but also collected data
from a high quality sample-based guitar synthesizer. Through quantitative and
qualitative evaluations, we show that our proposed model has higher audio
quality than the baseline model and generates more realistic timbre sounds than
the previous leading work.
( 2
min )
In this paper, we present a novel approach for detecting the discontinuity
interfaces of a discontinuous function. This approach leverages Graph-Informed
Neural Networks (GINNs) and sparse grids to address discontinuity detection
also in domains of dimension larger than 3. GINNs, trained to identify troubled
points on sparse grids, exploit graph structures built on the grids to achieve
efficient and accurate discontinuity detection performances. We also introduce
a recursive algorithm for general sparse grid-based detectors, characterized by
convergence properties and easy applicability. Numerical experiments on
functions with dimensions n = 2 and n = 4 demonstrate the efficiency and robust
generalization of GINNs in detecting discontinuity interfaces. Notably, the
trained GINNs offer portability and versatility, allowing integration into
various algorithms and sharing among users.
( 2
min )
Cross-validation is a widely used technique for assessing the performance of
predictive models on unseen data. Many predictive models, such as Kernel-Based
Partial Least-Squares (PLS) models, require the computation of
$\mathbf{X}^{\mathbf{T}}\mathbf{X}$ and $\mathbf{X}^{\mathbf{T}}\mathbf{Y}$
using only training set samples from the input and output matrices,
$\mathbf{X}$ and $\mathbf{Y}$, respectively. In this work, we present three
algorithms that efficiently compute these matrices. The first one allows no
column-wise preprocessing. The second one allows column-wise centering around
the training set means. The third one allows column-wise centering and
column-wise scaling around the training set means and standard deviations.
Demonstrating correctness and superior computational complexity, they offer
significant cross-validation speedup compared with straight-forward
cross-validation and previous work on fast cross-validation - all without data
leakage. Their suitability for parallelization is highlighted with an
open-source Python implementation combining our algorithms with Improved Kernel
PLS.
( 2
min )
A method for solving elasticity problems based on separable physics-informed
neural networks (SPINN) in conjunction with the deep energy method (DEM) is
presented. Numerical experiments have been carried out for a number of problems
showing that this method has a significantly higher convergence rate and
accuracy than the vanilla physics-informed neural networks (PINN) and even
SPINN based on a system of partial differential equations (PDEs). In addition,
using the SPINN in the framework of DEM approach it is possible to solve
problems of the linear theory of elasticity on complex geometries, which is
unachievable with the help of PINNs in frames of partial differential
equations. Considered problems are very close to the industrial problems in
terms of geometry, loading, and material parameters.
( 2
min )
We consider the ubiquitous linear inverse problems with additive Gaussian
noise and propose an unsupervised sampling approach called diffusion model
based posterior sampling (DMPS) to reconstruct the unknown signal from noisy
linear measurements. Specifically, using one diffusion model (DM) as an
implicit prior, the fundamental difficulty in performing posterior sampling is
that the noise-perturbed likelihood score, i.e., gradient of an annealed
likelihood function, is intractable. To circumvent this problem, we introduce a
simple yet effective closed-form approximation using an uninformative prior
assumption. Extensive experiments are conducted on a variety of noisy linear
inverse problems such as noisy super-resolution, denoising, deblurring, and
colorization. In all tasks, the proposed DMPS demonstrates highly competitive
or even better performances on various tasks while being 3 times faster than
the state-of-the-art competitor diffusion posterior sampling (DPS).
( 2
min )
In this work we demonstrate that significant gains in performance and data
efficiency can be achieved in High Energy Physics (HEP) by moving beyond the
standard paradigm of sequential optimization or reconstruction and analysis
components. We conceptually connect HEP reconstruction and analysis to modern
machine learning workflows such as pretraining, finetuning, domain adaptation
and high-dimensional embedding spaces and quantify the gains in the example
usecase of searches of heavy resonances decaying via an intermediate di-Higgs
system to four $b$-jets.
( 2
min )
Large language models (LLM) are generating information at a rapid pace,
requiring users to increasingly rely and trust the data. Despite remarkable
advances of LLM, Information generated by LLM is not completely trustworthy,
due to challenges in information quality. Specifically, integrity of
Information quality decreases due to unreliable, biased, tokenization during
pre-training of LLM. Moreover, due to decreased information quality issues, has
led towards hallucination, fabricated information. Unreliable information can
lead towards flawed decisions in businesses, which impacts economic activity.
In this work, we introduce novel mathematical information quality evaluation of
LLM, we furthermore analyze and highlight information quality challenges,
scaling laws to systematically scale language models.
( 2
min )
The SINDy algorithm has been successfully used to identify the governing
equations of dynamical systems from time series data. However, SINDy assumes
the user has prior knowledge of the variables in the system and of a function
library that can act as a basis for the system. In this paper, we demonstrate
on real world data how the Augmented SINDy algorithm outperforms SINDy in the
presence of system variable uncertainty. We then show SINDy can be further
augmented to perform robustly when both kinds of uncertainty are present.
( 2
min )
Emotion Recognition in Conversation (ERC) plays a crucial role in enabling
dialogue systems to effectively respond to user requests. The emotions in a
conversation can be identified by the representations from various modalities,
such as audio, visual, and text. However, due to the weak contribution of
non-verbal modalities to recognize emotions, multimodal ERC has always been
considered a challenging task. In this paper, we propose Teacher-leading
Multimodal fusion network for ERC (TelME). TelME incorporates cross-modal
knowledge distillation to transfer information from a language model acting as
the teacher to the non-verbal students, thereby optimizing the efficacy of the
weak modalities. We then combine multimodal features using a shifting fusion
approach in which student networks support the teacher. TelME achieves
state-of-the-art performance in MELD, a multi-speaker conversation dataset for
ERC. Finally, we demonstrate the effectiveness of our components through
additional experiments.
( 2
min )
Receiver operating characteristic (ROC) analysis is widely used for
evaluating diagnostic systems. Recent studies have shown that estimating an
area under ROC curve (AUC) with standard cross-validation methods suffers from
a large bias. The leave-pair-out (LPO) cross-validation has been shown to
correct this bias. However, while LPO produces an almost unbiased estimate of
AUC, it does not provide a ranking of the data needed for plotting and
analyzing the ROC curve. In this study, we propose a new method called
tournament leave-pair-out (TLPO) cross-validation. This method extends LPO by
creating a tournament from pair comparisons to produce a ranking for the data.
TLPO preserves the advantage of LPO for estimating AUC, while it also allows
performing ROC analyses. We have shown using both synthetic and real world data
that TLPO is as reliable as LPO for AUC estimation, and confirmed the bias in
leave-one-out cross-validation on low-dimensional data. As a case study on ROC
analysis, we also evaluate how reliably sensitivity and specificity can be
estimated from TLPO ROC curves.
( 2
min )
We consider the ubiquitous linear inverse problems with additive Gaussian
noise and propose an unsupervised sampling approach called diffusion model
based posterior sampling (DMPS) to reconstruct the unknown signal from noisy
linear measurements. Specifically, using one diffusion model (DM) as an
implicit prior, the fundamental difficulty in performing posterior sampling is
that the noise-perturbed likelihood score, i.e., gradient of an annealed
likelihood function, is intractable. To circumvent this problem, we introduce a
simple yet effective closed-form approximation using an uninformative prior
assumption. Extensive experiments are conducted on a variety of noisy linear
inverse problems such as noisy super-resolution, denoising, deblurring, and
colorization. In all tasks, the proposed DMPS demonstrates highly competitive
or even better performances on various tasks while being 3 times faster than
the state-of-the-art competitor diffusion posterior sampling (DPS).
( 2
min )
Recently, there has been a growing interest for mixed-categorical meta-models
based on Gaussian process (GP) surrogates. In this setting, several existing
approaches use different strategies either by using continuous kernels (e.g.,
continuous relaxation and Gower distance based GP) or by using a direct
estimation of the correlation matrix. In this paper, we present a kernel-based
approach that extends continuous exponential kernels to handle
mixed-categorical variables. The proposed kernel leads to a new GP surrogate
that generalizes both the continuous relaxation and the Gower distance based GP
models. We demonstrate, on both analytical and engineering problems, that our
proposed GP model gives a higher likelihood and a smaller residual error than
the other kernel-based state-of-the-art models. Our method is available in the
open-source software SMT.
( 2
min )
In causal inference with panel data under staggered adoption, the goal is to
estimate and derive confidence intervals for potential outcomes and treatment
effects. We propose a computationally efficient procedure, involving only
simple matrix algebra and singular value decomposition. We derive
non-asymptotic bounds on the entrywise error, establishing its proximity to a
suitably scaled Gaussian variable. Despite its simplicity, our procedure turns
out to be instance-optimal, in that our theoretical scaling matches a local
instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. Using
our insights, we develop a data-driven procedure for constructing entrywise
confidence intervals with pre-specified coverage guarantees. Our analysis is
based on a general inferential toolbox for the SVD algorithm applied to the
matrix denoising model, which might be of independent interest.
( 2
min )
In this post, we show you how to bring Amazon Q, your business expert, to users in Microsoft Teams. (If you use Slack, refer to Deploy a Slack gateway for Amazon Q, your business expert.) You’ll be able converse with Amazon Q business expert using Teams direct messages (DMs) to ask questions and get answers based on company data, get help creating new content such as email drafts, summarize attached files, and perform tasks.
( 10
min )
This GFN Thursday levels up PC gaming on mobile with higher-resolution support on Android devices. This week also brings 10 new games to the GeForce NOW library, including Enshrouded. Pixel Perfect GeForce NOW transforms nearly any device into a high-powered PC gaming rig, and members streaming on Android can now access that power from the Read article >
( 6
min )
This paper introduces an \textit{online bilevel optimization} setting in
which a sequence of time-varying bilevel problems are revealed one after the
other. We extend the known regret bounds for single-level online algorithms to
the bilevel setting. Specifically, we provide new notions of \textit{bilevel
regret}, develop an online alternating time-averaged gradient method that is
capable of leveraging smoothness, and give regret bounds in terms of the
path-length of the inner and outer minimizer sequences.
( 2
min )
The wave equation is an important physical partial differential equation, and
in recent years, deep learning has shown promise in accelerating or replacing
traditional numerical methods for solving it. However, existing deep learning
methods suffer from high data acquisition costs, low training efficiency, and
insufficient generalization capability for boundary conditions. To address
these issues, this paper proposes an unsupervised learning method for the wave
equation based on finite difference residual constraints. We construct a novel
finite difference residual constraint based on structured grids and finite
difference methods, as well as an unsupervised training strategy, enabling
convolutional neural networks to train without data and predict the forward
propagation process of waves. Experimental results show that finite difference
residual constraints have advantages over physics-informed neural networks
(PINNs) type physical information constraints, such as easier fitting, lower
computational costs, and stronger source term generalization capability, making
our method more efficient in training and potent in application.
( 2
min )
Register allocation is one of the most important problems for modern
compilers. With a practically unlimited number of user variables and a small
number of CPU registers, assigning variables to registers without conflicts is
a complex task. This work demonstrates the use of casting the register
allocation problem as a graph coloring problem. Using technologies such as
PyTorch and OpenAI Gymnasium Environments we will show that a Proximal Policy
Optimization model can learn to solve the graph coloring problem. We will also
show that the labeling of a graph is critical to the performance of the model
by taking the matrix representation of a graph and permuting it. We then test
the model's effectiveness on each of these permutations and show that it is not
effective when given a relabeling of the same graph. Our main contribution lies
in showing the need for label reordering invariant representations of graphs
for machine learning models to achieve consistent performance.
( 2
min )
The MagNet Challenge 2023 calls upon competitors to develop data-driven
models for the material-specific, waveform-agnostic estimation of steady-state
power losses in toroidal ferrite cores. The following HARDCORE (H-field and
power loss estimation for Arbitrary waveforms with Residual, Dilated
convolutional neural networks in ferrite COREs) approach shows that a residual
convolutional neural network with physics-informed extensions can serve this
task efficiently when trained on observational data beforehand. One key
solution element is an intermediate model layer which first reconstructs the bh
curve and then estimates the power losses based on the curve's area rendering
the proposed topology physically interpretable. In addition, emphasis was
placed on expert-based feature engineering and information-rich inputs in order
to enable a lean model architecture. A model is trained from scratch for each
material, while the topology remains the same. A Pareto-style trade-off between
model size and estimation accuracy is demonstrated, which yields an optimum at
as low as 1755 parameters and down to below 8\,\% for the 95-th percentile of
the relative error for the worst-case material with sufficient samples.
( 3
min )
This paper presents a {\delta}-PI algorithm which is based on damped Newton
method for the H{\infty} tracking control problem of unknown continuous-time
nonlinear system. A discounted performance function and an augmented system are
used to get the tracking Hamilton-Jacobi-Isaac (HJI) equation. Tracking HJI
equation is a nonlinear partial differential equation, traditional
reinforcement learning methods for solving the tracking HJI equation are mostly
based on the Newton method, which usually only satisfies local convergence and
needs a good initial guess. Based upon the damped Newton iteration operator
equation, a generalized tracking Bellman equation is derived firstly. The
{\delta}-PI algorithm can seek the optimal solution of the tracking HJI
equation by iteratively solving the generalized tracking Bellman equation.
On-policy learning and off-policy learning {\delta}-PI reinforcement learning
methods are provided, respectively. Off-policy version {\delta}-PI algorithm is
a model-free algorithm which can be performed without making use of a priori
knowledge of the system dynamics. NN-based implementation scheme for the
off-policy {\delta}-PI algorithms is shown. The suitability of the model-free
{\delta}-PI algorithm is illustrated with a nonlinear system simulation.
( 2
min )
An innovative methodology that leverages artificial intelligence (AI) and
graph representation for semiconductor device encoding in TCAD device
simulation is proposed. A graph-based universal encoding scheme is presented
that not only considers material-level and device-level embeddings, but also
introduces a novel spatial relationship embedding inspired by interpolation
operations typically used in finite element meshing. Universal physical laws
from device simulations are leveraged for comprehensive data-driven modeling,
which encompasses surrogate Poisson emulation and current-voltage (IV)
prediction based on drift-diffusion model. Both are achieved using a novel
graph attention network, referred to as RelGAT. Comprehensive technical details
based on the device simulator Sentaurus TCAD are presented, empowering
researchers to adopt the proposed AI-driven Electronic Design Automation (EDA)
solution at the device level.
( 2
min )
Electrophysiological nature of neuronal networks allows to reveal various
interactions between different cell units at a very short time-scales. One of
the many challenges in analyzing these signals is to retrieve the morphology
and functionality of a given network. In this work we developed a computational
model, based on Reservoir Computing Network (RCN) architecture, which decodes
the spatio-temporal data from electro-physiological measurements of neuronal
cultures and reconstructs the network structure on a macroscopic domain,
representing the connectivity between neuronal units. We demonstrate that the
model can predict the connectivity map of the network with higher accuracy than
the common methods such as Cross-Correlation and Transfer-Entropy. In addition,
we experimentally demonstrate the ability of the model to predict a network
response to a specific input, such as localized stimulus.
( 2
min )
This paper proposes to develop a new variant of the two-time-scale stochastic
approximation to find the roots of two coupled nonlinear operators, assuming
only noisy samples of these operators can be observed. Our key idea is to
leverage the classic Ruppert-Polyak averaging technique to dynamically estimate
the operators through their samples. The estimated values of these averaging
steps will then be used in the two-time-scale stochastic approximation updates
to find the desired solution. Our main theoretical result is to show that under
the strongly monotone condition of the underlying nonlinear operators the
mean-squared errors of the iterates generated by the proposed method converge
to zero at an optimal rate $\mathcal{O}(1/k)$, where $k$ is the number of
iterations. Our result significantly improves the existing result of
two-time-scale stochastic approximation, where the best known finite-time
convergence rate is $\mathcal{O}(1/k^{2/3})$.
( 2
min )
We consider the design of fast and reliable neural network (NN)-based
approximations of traditional stabilizing controllers for linear systems with
polytopic uncertainty, including control laws with variable structure and those
based on a (minimal) selection policy. Building upon recent approaches for the
design of reliable control surrogates with guaranteed structural properties, we
develop a systematic procedure to certify the closed-loop stability and
performance of a linear uncertain system when a trained rectified linear unit
(ReLU)-based approximation replaces such traditional controllers. First, we
provide a sufficient condition, which involves the worst-case approximation
error between ReLU-based and traditional controller-based state-to-input
mappings, ensuring that the system is ultimately bounded within a set with
adjustable size and convergence rate. Then, we develop an offline,
mixed-integer optimization-based method that allows us to compute that quantity
exactly.
( 2
min )
Natural language processing has made progress in incorporating human context
into its models, but whether it is more effective to use group-wise attributes
(e.g., over-45-year-olds) or model individuals remains open. Group attributes
are technically easier but coarse: not all 45-year-olds write the same way. In
contrast, modeling individuals captures the complexity of each person's
identity. It allows for a more personalized representation, but we may have to
model an infinite number of users and require data that may be impossible to
get. We compare modeling human context via group attributes, individual users,
and combined approaches. Combining group and individual features significantly
benefits user-level regression tasks like age estimation or personality
assessment from a user's documents. Modeling individual users significantly
improves the performance of single document-level classification tasks like
stance and topic detection. We also find that individual-user modeling does
well even without user's historical data.
( 2
min )
Inspired by regularization techniques in statistics and machine learning, we
study complementary composite minimization in the stochastic setting. This
problem corresponds to the minimization of the sum of a (weakly) smooth
function endowed with a stochastic first-order oracle, and a structured
uniformly convex (possibly nonsmooth and non-Lipschitz) regularization term.
Despite intensive work on closely related settings, prior to our work no
complexity bounds for this problem were known. We close this gap by providing
novel excess risk bounds, both in expectation and with high probability. Our
algorithms are nearly optimal, which we prove via novel lower complexity bounds
for this class of problems. We conclude by providing numerical results
comparing our methods to the state of the art.
( 2
min )
Reinforcement learning is an emerging approaches to facilitate multi-stage
sequential decision-making problems. This paper studies a real-time multi-stage
stochastic power dispatch considering multivariate uncertainties. Current
researches suffer from low generalization and practicality, that is, the
learned dispatch policy can only handle a specific dispatch scenario, its
performance degrades significantly if actual samples and training samples are
inconsistent. To fill these gaps, a novel contextual meta graph reinforcement
learning (Meta-GRL) for a highly generalized multi-stage optimal dispatch
policy is proposed. Specifically, a more general contextual Markov decision
process (MDP) and scalable graph representation are introduced to achieve a
more generalized multi-stage stochastic power dispatch modeling. An upper
meta-learner is proposed to encode context for different dispatch scenarios and
learn how to achieve dispatch task identification while the lower policy
learner learns context-specified dispatch policy. After sufficient offline
learning, this approach can rapidly adapt to unseen and undefined scenarios
with only a few updations of the hypothesis judgments generated by the
meta-learner. Numerical comparisons with state-of-the-art policies and
traditional reinforcement learning verify the optimality, efficiency,
adaptability, and scalability of the proposed Meta-GRL.
( 2
min )
Accurate uncertainty measurement is a key step to building robust and
reliable machine learning systems. Conformal prediction is a distribution-free
uncertainty quantification algorithm popular for its ease of implementation,
statistical coverage guarantees, and versatility for underlying forecasters.
However, existing conformal prediction algorithms for time series are limited
to single-step prediction without considering the temporal dependency. In this
paper we propose a Copula Conformal Prediction algorithm for multivariate,
multi-step Time Series forecasting, CopulaCPTS. We prove that CopulaCPTS has
finite sample validity guarantee. On several synthetic and real-world
multivariate time series datasets, we show that CopulaCPTS produces more
calibrated and sharp confidence intervals for multi-step prediction tasks than
existing techniques.
( 2
min )
Deep learning (DL) is gaining popularity as a parameter estimation method for
quantitative MRI. A range of competing implementations have been proposed,
relying on either supervised or self-supervised learning. Self-supervised
approaches, sometimes referred to as unsupervised, have been loosely based on
auto-encoders, whereas supervised methods have, to date, been trained on
groundtruth labels. These two learning paradigms have been shown to have
distinct strengths. Notably, self-supervised approaches have offered lower-bias
parameter estimates than their supervised alternatives. This result is
counterintuitive - incorporating prior knowledge with supervised labels should,
in theory, lead to improved accuracy. In this work, we show that this apparent
limitation of supervised approaches stems from the naive choice of groundtruth
training labels. By training on labels which are deliberately not groundtruth,
we show that the low-bias parameter estimation previously associated with
self-supervised methods can be replicated - and improved on - within a
supervised learning framework. This approach sets the stage for a single,
unifying, deep learning parameter estimation framework, based on supervised
learning, where trade-offs between bias and variance are made by careful
adjustment of training label.
( 3
min )
Literature-Based Discovery (LBD) aims to discover new scientific knowledge by
mining papers and generating hypotheses. Standard LBD is limited to predicting
pairwise relations between discrete concepts (e.g., drug-disease links), and
ignores critical contexts like experimental settings (e.g., a specific patient
population where a drug is evaluated) and background motivations (e.g., to find
drugs without specific side effects). We address these limitations with a novel
formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in
natural language, while grounding them in a context that controls the
hypothesis search space. We present a modeling framework using retrieval of
``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4
tends to generate ideas with overall low technical depth and novelty, while our
inspiration prompting approaches partially mitigate this issue. Our work
represents a first step toward building language models that generate new ideas
derived from scientific literature.
( 2
min )
We introduce and investigate the iterated application of Generalized Matrix
Learning Vector Quantizaton for the analysis of feature relevances in
classification problems, as well as for the construction of
class-discriminative subspaces. The suggested Iterated Relevance Matrix
Analysis (IRMA) identifies a linear subspace representing the classification
specific information of the considered data sets using Generalized Matrix
Learning Vector Quantization (GMLVQ). By iteratively determining a new
discriminative subspace while projecting out all previously identified ones, a
combined subspace carrying all class-specific information can be found. This
facilitates a detailed analysis of feature relevances, and enables improved
low-dimensional representations and visualizations of labeled data sets.
Additionally, the IRMA-based class-discriminative subspace can be used for
dimensionality reduction and the training of robust classifiers with
potentially improved performance.
( 2
min )
In this work, we compare emergent communication (EC) built upon multi-agent
deep reinforcement learning (MADRL) and language-oriented semantic
communication (LSC) empowered by a pre-trained large language model (LLM) using
human language. In a multi-agent remote navigation task, with multimodal input
data comprising location and channel maps, it is shown that EC incurs high
training cost and struggles when using multimodal data, whereas LSC yields high
inference computing cost due to the LLM's large size. To address their
respective bottlenecks, we propose a novel framework of language-guided EC
(LEC) by guiding the EC training using LSC via knowledge distillation (KD).
Simulations corroborate that LEC achieves faster travel time while avoiding
areas with poor channel conditions, as well as speeding up the MADRL training
convergence by up to 61.8% compared to EC.
( 2
min )
As social media platforms are evolving from text-based forums into
multi-modal environments, the nature of misinformation in social media is also
transforming accordingly. Taking advantage of the fact that visual modalities
such as images and videos are more favorable and attractive to the users and
textual contents are sometimes skimmed carelessly, misinformation spreaders
have recently targeted contextual connections between the modalities e.g., text
and image. Hence many researchers have developed automatic techniques for
detecting possible cross-modal discordance in web-based content. We analyze,
categorize and identify existing approaches in addition to challenges and
shortcomings they face in order to unearth new research opportunities in the
field of multi-modal misinformation detection.
( 2
min )
Predicting the infiltration of Glioblastoma (GBM) from medical MRI scans is
crucial for understanding tumor growth dynamics and designing personalized
radiotherapy treatment plans.Mathematical models of GBM growth can complement
the data in the prediction of spatial distributions of tumor cells. However,
this requires estimating patient-specific parameters of the model from clinical
data, which is a challenging inverse problem due to limited temporal data and
the limited time between imaging and diagnosis. This work proposes a method
that uses Physics-Informed Neural Networks (PINNs) to estimate patient-specific
parameters of a reaction-diffusion PDE model of GBM growth from a single 3D
structural MRI snapshot. PINNs embed both the data and the PDE into a loss
function, thus integrating theory and data. Key innovations include the
identification and estimation of characteristic non-dimensional parameters, a
pre-training step that utilizes the non-dimensional parameters and a
fine-tuning step to determine the patient specific parameters. Additionally,
the diffuse domain method is employed to handle the complex brain geometry
within the PINN framework. Our method is validated both on synthetic and
patient datasets, and shows promise for real-time parametric inference in the
clinical setting for personalized GBM treatment.
( 3
min )
Recently how to introduce large amounts of unlabeled facial images in the
wild into supervised Facial Action Unit (AU) detection frameworks has become a
challenging problem. In this paper, we propose a new AU detection framework
where multi-task learning is introduced to jointly learn AU domain separation
and reconstruction and facial landmark detection by sharing the parameters of
homostructural facial extraction modules. In addition, we propose a new feature
alignment scheme based on contrastive learning by simple projectors and an
improved contrastive loss, which adds four additional intermediate supervisors
to promote the feature reconstruction process. Experimental results on two
benchmarks demonstrate our superiority against the state-of-the-art methods for
AU detection in the wild.
( 2
min )
Graph-based collaborative filtering methods have prevailing performance for
recommender systems since they can capture high-order information between users
and items, in which the graphs are constructed from the observed user-item
interactions that might miss links or contain spurious positive interactions in
industrial scenarios. The Bayesian Graph Neural Network framework approaches
this issue with generative models for the interaction graphs. The critical
problem is to devise a proper family of graph generative models tailored to
recommender systems. We propose an efficient generative model that jointly
considers the preferences of users, the concurrence of items and some important
graph structure information. Experiments on four popular benchmark datasets
demonstrate the effectiveness of our proposed graph generative methods for
recommender systems.
( 2
min )
Phosphorus removal is vital in wastewater treatment to reduce reliance on
limited resources. Deep reinforcement learning (DRL) is a machine learning
technique that can optimize complex and nonlinear systems, including the
processes in wastewater treatment plants, by learning control policies through
trial and error. However, applying DRL to chemical and biological processes is
challenging due to the need for accurate simulators. This study trained six
models to identify the phosphorus removal process and used them to create a
simulator for the DRL environment. Although the models achieved high accuracy
(>97%), uncertainty and incorrect prediction behavior limited their performance
as simulators over longer horizons. Compounding errors in the models'
predictions were identified as one of the causes of this problem. This approach
for improving process control involves creating simulation environments for DRL
algorithms, using data from supervisory control and data acquisition (SCADA)
systems with a sufficient historical horizon without complex system modeling or
parameter estimation.
( 2
min )
Deep learning models have become increasingly popular for flood prediction
due to their superior accuracy and efficiency compared to traditional methods.
However, current machine learning methods often rely on separate spatial or
temporal feature analysis and have limitations on the types, number, and
dimensions of input data. This study presented a CNN-RNN hybrid feature fusion
modelling approach for urban flood prediction, which integrated the strengths
of CNNs in processing spatial features and RNNs in analyzing different
dimensions of time sequences. This approach allowed for both static and dynamic
flood predictions. Bayesian optimization was applied to identify the seven most
influential flood-driven factors and determine the best combination strategy.
By combining four CNNs (FCN, UNet, SegNet, DeepLabv3+) and three RNNs (LSTM,
BiLSTM, GRU), the optimal hybrid model was identified as LSTM-DeepLabv3+. This
model achieved the highest prediction accuracy (MAE, RMSE, NSE, and KGE were
0.007, 0.025, 0.973 and 0.755, respectively) under various rainfall input
conditions. Additionally, the processing speed was significantly improved, with
an inference time of 1.158s (approximately 1/125 of the traditional computation
time) compared to the physically-based models.
( 2
min )
In the era of the Internet of Things (IoT), decentralized paradigms for
machine learning are gaining prominence. In this paper, we introduce a
federated learning model that capitalizes on the Euclidean distance between
device model weights to assess their similarity and disparity. This is
foundational for our system, directing the formation of coalitions among
devices based on the closeness of their model weights. Furthermore, the concept
of a barycenter, representing the average of model weights, helps in the
aggregation of updates from multiple devices. We evaluate our approach using
homogeneous and heterogeneous data distribution, comparing it against
traditional federated learning averaging algorithm. Numerical results
demonstrate its potential in offering structured, outperformed and
communication-efficient model for IoT-based machine learning.
( 2
min )
We develop a versatile framework for statistical learning in non-stationary
environments. In each time period, our approach applies a stability principle
to select a look-back window that maximizes the utilization of historical data
while keeping the cumulative bias within an acceptable range relative to the
stochastic error. Our theory showcases the adaptability of this approach to
unknown non-stationarity. The regret bound is minimax optimal up to logarithmic
factors when the population losses are strongly convex, or Lipschitz only. At
the heart of our analysis lie two novel components: a measure of similarity
between functions and a segmentation technique for dividing the non-stationary
data sequence into quasi-stationary pieces.
( 2
min )
Neural Architecture Search (NAS) has become the de-facto approach for
designing accurate and efficient networks for edge devices. Since models are
typically quantized for edge deployment, recent work has investigated
quantization-aware NAS (QA-NAS) to search for highly accurate and efficient
quantized models. However, existing QA-NAS approaches, particularly few-bit
mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently,
QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this
work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale
tasks by leveraging the block-wise formulation introduced by block-wise NAS. We
demonstrate strong results for the semantic segmentation task on the Cityscapes
dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than
DeepLabV3 (INT8) without compromising task performance.
( 2
min )
In this paper, we present a novel bilevel optimization-based training
approach to training acoustic models for automatic speech recognition (ASR)
tasks that we term {bi-level joint unsupervised and supervised training
(BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an
unsupervised loss and a supervised loss respectively, leveraging recent
advances in penalty-based bilevel optimization to solve this challenging ASR
problem with affordable complexity and rigorous convergence guarantees.} To
evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2
datasets have been conducted. BL-JUST achieves superior performance over the
commonly used pre-training followed by fine-tuning strategy.
( 2
min )
In this study, we investigated the potential of GPT-3 for the anti-cancer
drug sensitivity prediction task using structured pharmacogenomics data across
five tissue types and evaluated its performance with zero-shot prompting and
fine-tuning paradigms. The drug's smile representation and cell line's genomic
mutation features were predictive of the drug response. The results from this
study have the potential to pave the way for designing more efficient treatment
protocols in precision oncology.
( 2
min )
Interactive Machine Learning (IML) seeks to integrate human expertise into
machine learning processes. However, most existing algorithms cannot be applied
to Realworld Scenarios because their state spaces and/or action spaces are
limited to discrete values. Furthermore, the interaction of all existing
methods is restricted to deciding between multiple proposals. We therefore
propose a novel framework based on Bayesian Optimization (BO). Interactive
Bayesian Optimization (IBO) enables collaboration between machine learning
algorithms and humans. This framework captures user preferences and provides an
interface for users to shape the strategy by hand. Additionally, we've
incorporated a new acquisition function, Preference Expected Improvement (PEI),
to refine the system's efficiency using a probabilistic model of the user
preferences. Our approach is geared towards ensuring that machines can benefit
from human expertise, aiming for a more aligned and effective learning process.
In the course of this work, we applied our method to simulations and in a real
world task using a Franka Panda robot to show human-robot collaboration.
( 2
min )
The aim of this work is to create and apply a methodological approach for
predicting gas traps from 3D seismic data and gas well testing. The paper
formalizes the approach to creating a training dataset by selecting volumes
with established gas saturation and filtration properties within the seismic
wavefield. The training dataset thus created is used in a process stack of
sequential application of data processing methods and ensemble machine learning
algorithms. As a result, a cube of calibrated probabilities of belonging of the
study space to gas reservoirs was obtained. The high efficiency of this
approach is shown on a delayed test sample of three wells (blind wells). The
final value of the gas reservoir prediction quality metric f1 score was
0.893846.
( 2
min )
Urban region embedding is an important and yet highly challenging issue due
to the complexity and constantly changing nature of urban data. To address the
challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER)
to capture multi-view dependencies and learn expressive representations of
urban regions without the constraints of rigid neighbourhood region conditions.
Our model focus on learn urban region representation from multi-source urban
data. First, we capture the multi-view correlations from mobility flow
patterns, POI semantics and check-in dynamics. Then, we adopt global graph
attention networks to learn similarity of any two vertices in graphs. To
comprehensively consider and share features of multiple views, a two-stage
fusion module is further proposed to learn weights with external attention to
fuse multi-view embeddings. Extensive experiments for two downstream tasks on
real-world datasets demonstrate that our model outperforms state-of-the-art
methods by up to 17\% improvement.
( 2
min )
Bayesian Neural Networks (BayNNs) naturally provide uncertainty in their
predictions, making them a suitable choice in safety-critical applications.
Additionally, their realization using memristor-based in-memory computing (IMC)
architectures enables them for resource-constrained edge applications. In
addition to predictive uncertainty, however, the ability to be inherently
robust to noise in computation is also essential to ensure functional safety.
In particular, memristor-based IMCs are susceptible to various sources of
non-idealities such as manufacturing and runtime variations, drift, and
failure, which can significantly reduce inference accuracy. In this paper, we
propose a method to inherently enhance the robustness and inference accuracy of
BayNNs deployed in IMC architectures. To achieve this, we introduce a novel
normalization layer combined with stochastic affine transformations. Empirical
results in various benchmark datasets show a graceful degradation in inference
accuracy, with an improvement of up to $58.11\%$.
( 2
min )
In this paper, we explore low-power custom quantised Multi-Layer Perceptrons
(MLPs) as an Intrusion Detection System (IDS) for automotive controller area
network (CAN). We utilise the FINN framework from AMD/Xilinx to quantise, train
and generate hardware IP of our MLP to detect denial of service (DoS) and
fuzzying attacks on CAN network, using ZCU104 (XCZU7EV) FPGA as our target ECU
architecture with integrated IDS capabilities. Our approach achieves
significant improvements in latency (0.12 ms per-message processing latency)
and inference energy consumption (0.25 mJ per inference) while achieving
similar classification performance as state-of-the-art approaches in the
literature.
( 2
min )
We argue that insurance can act as an analogon for the social situatedness of
machine learning systems, hence allowing machine learning scholars to take
insights from the rich and interdisciplinary insurance literature. Tracing the
interaction of uncertainty, fairness and responsibility in insurance provides a
fresh perspective on fairness in machine learning. We link insurance fairness
conceptions to their machine learning relatives, and use this bridge to
problematize fairness as calibration. In this process, we bring to the
forefront two themes that have been largely overlooked in the machine learning
literature: responsibility and aggregate-individual tensions.
( 2
min )
We present the first mini-batch algorithm for maximizing a non-negative
monotone decomposable submodular function, $F=\sum_{i=1}^N f^i$, under a set of
constraints. We improve over the sparsifier based approach both in theory and
in practice. We experimentally observe that our algorithm generates solutions
that are far superior to those generated by the sparsifier based approach.
( 2
min )
We respond to the recent paper by Makelov et al. (2023), which reviews
subspace interchange intervention methods like distributed alignment search
(DAS; Geiger et al. 2023) and claims that these methods potentially cause
"interpretability illusions". We first review Makelov et al. (2023)'s technical
notion of what an "interpretability illusion" is, and then we show that even
intuitive and desirable explanations can qualify as illusions in this sense. As
a result, their method of discovering "illusions" can reject explanations they
consider "non-illusory". We then argue that the illusions Makelov et al. (2023)
see in practice are artifacts of their training and evaluation paradigms. We
close by emphasizing that, though we disagree with their core characterization,
Makelov et al. (2023)'s examples and discussion have undoubtedly pushed the
field of interpretability forward.
( 2
min )
Sound event localization and detection (SELD) is an important task in machine
listening. Major advancements rely on simulated data with sound events in
specific rooms and strong spatio-temporal labels. SELD data is simulated by
convolving spatialy-localized room impulse responses (RIRs) with sound
waveforms to place sound events in a soundscape. However, RIRs require manual
collection in specific rooms. We present SpatialScaper, a library for SELD data
simulation and augmentation. Compared to existing tools, SpatialScaper emulates
virtual rooms via parameters such as size and wall absorption. This allows for
parameterized placement (including movement) of foreground and background sound
sources. SpatialScaper also includes data augmentation pipelines that can be
applied to existing SELD data. As a case study, we use SpatialScaper to add
rooms to the DCASE SELD data. Training a model with our data led to progressive
performance improves as a direct function of acoustic diversity. These results
show that SpatialScaper is valuable to train robust SELD models.
( 2
min )
Transfer learning for nonparametric regression is considered. We first study
the non-asymptotic minimax risk for this problem and develop a novel estimator
called the confidence thresholding estimator, which is shown to achieve the
minimax optimal risk up to a logarithmic factor. Our results demonstrate two
unique phenomena in transfer learning: auto-smoothing and super-acceleration,
which differentiate it from nonparametric regression in a traditional setting.
We then propose a data-driven algorithm that adaptively achieves the minimax
risk up to a logarithmic factor across a wide range of parameter spaces.
Simulation studies are conducted to evaluate the numerical performance of the
adaptive transfer learning algorithm, and a real-world example is provided to
demonstrate the benefits of the proposed method.
( 2
min )
In the modern transportation industry, accurate prediction of travelers' next
destinations brings multiple benefits to companies, such as customer
satisfaction and targeted marketing. This study focuses on developing a precise
model that captures the sequential patterns and dependencies in travel data,
enabling accurate predictions of individual travelers' future destinations. To
achieve this, a novel model architecture with a sliding window approach based
on Long Short-Term Memory (LSTM) is proposed for destination prediction in the
transportation industry. The experimental results highlight satisfactory
performance and high scores achieved by the proposed model across different
data sizes and performance metrics. This research contributes to advancing
destination prediction methods, empowering companies to deliver personalized
recommendations and optimize customer experiences in the dynamic travel
landscape.
( 2
min )
Object Detection (OD) has proven to be a significant computer vision method
in extracting localized class information and has multiple applications in the
industry. Although many of the state-of-the-art (SOTA) OD models perform well
on medium and large sized objects, they seem to under perform on small objects.
In most of the industrial use cases, it is difficult to collect and annotate
data for small objects, as it is time-consuming and prone to human errors.
Additionally, those datasets are likely to be unbalanced and often result in an
inefficient model convergence. To tackle this challenge, this study presents a
novel approach that injects additional data points to improve the performance
of the OD models. Using synthetic data generation, the difficulties in data
collection and annotations for small object data points can be minimized and to
create a dataset with balanced distribution. This paper discusses the effects
of a simple proportional class-balancing technique, to enable better anchor
matching of the OD models. A comparison was carried out on the performances of
the SOTA OD models: YOLOv5, YOLOv7 and SSD, for combinations of real and
synthetic datasets within an industrial use case.
( 3
min )
We report results of a longitudinal sentiment classification of Reddit posts
written by students of four major Canadian universities. We work with the texts
of the posts, concentrating on the years 2020-2023. By finely tuning a
sentiment threshold to a range of [-0.075,0.075], we successfully built
classifiers proficient in categorizing post sentiments into positive and
negative categories. Noticeably, our sentiment classification results are
consistent across the four university data sets.
( 2
min )
We consider a regularized expected reward optimization problem in the
non-oblivious setting that covers many existing problems in reinforcement
learning (RL). In order to solve such an optimization problem, we apply and
analyze the classical stochastic proximal gradient method. In particular, the
method has shown to admit an $O(\epsilon^{-4})$ sample complexity to an
$\epsilon$-stationary point, under standard conditions. Since the variance of
the classical stochastic gradient estimator is typically large which slows down
the convergence, we also apply an efficient stochastic variance-reduce proximal
gradient method with an importance sampling based ProbAbilistic Gradient
Estimator (PAGE). To the best of our knowledge, the application of this method
represents a novel approach in addressing the general regularized reward
optimization problem. Our analysis shows that the sample complexity can be
improved from $O(\epsilon^{-4})$ to $O(\epsilon^{-3})$ under additional
conditions. Our results on the stochastic (variance-reduced) proximal gradient
method match the sample complexity of their most competitive counterparts under
similar settings in the RL literature.
( 2
min )
In continual learning, catastrophic forgetting is affected by multiple
aspects of the tasks. Previous works have analyzed separately how forgetting is
affected by either task similarity or overparameterization. In contrast, our
paper examines how task similarity and overparameterization jointly affect
forgetting in an analyzable model. Specifically, we focus on two-task continual
linear regression, where the second task is a random orthogonal transformation
of an arbitrary first task (an abstraction of random permutation tasks). We
derive an exact analytical expression for the expected forgetting - and uncover
a nuanced pattern. In highly overparameterized models, intermediate task
similarity causes the most forgetting. However, near the interpolation
threshold, forgetting decreases monotonically with the expected task
similarity. We validate our findings with linear regression on synthetic data,
and with neural networks on established permutation task benchmarks.
( 2
min )
This paper introduces SpecInfer, a system that accelerates generative large
language model (LLM) serving with tree-based speculative inference and
verification. The key idea behind SpecInfer is leveraging small speculative
models to predict the LLM's outputs; the predictions are organized as a token
tree, whose nodes each represent a candidate token sequence. The correctness of
all candidate token sequences represented by a token tree is verified against
the LLM in parallel using a novel tree-based parallel decoding mechanism.
SpecInfer uses an LLM as a token tree verifier instead of an incremental
decoder, which significantly reduces the end-to-end latency and computational
requirement for serving generative LLMs while provably preserving model
quality. Our evaluation shows that SpecInfer outperforms existing LLM serving
systems by 1.5-2.8x for distributed LLM inference and by 2.6-3.5x for
offloading-based LLM inference, while preserving the same generative
performance. SpecInfer is publicly available at
https://github.com/flexflow/FlexFlow/
( 2
min )
In this paper, we investigate the intersection of large generative AI models
and cloud-native computing architectures. Recent large models such as ChatGPT,
while revolutionary in their capabilities, face challenges like escalating
costs and demand for high-end GPUs. Drawing analogies between
large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we
describe an AI-native computing paradigm that harnesses the power of both
cloud-native technologies (e.g., multi-tenancy and serverless computing) and
advanced machine learning runtime (e.g., batched LoRA inference). These joint
efforts aim to optimize costs-of-goods-sold (COGS) and improve resource
accessibility. The journey of merging these two domains is just at the
beginning and we hope to stimulate future research and development in this
area.
( 2
min )
We propose an approach for curating multimodal data that we used for our
entry in the 2023 DataComp competition filtering track. Our technique combines
object detection and weak supervision-based ensembling. In the first of two
steps in our approach, we employ an out-of-the-box zero-shot object detection
model to extract granular information and produce a variety of filter designs.
In the second step, we employ weak supervision to ensemble filtering rules.
This approach results in a 4% performance improvement when compared to the
best-performing baseline, producing the top-ranking position in the small scale
track at the time of writing. Furthermore, in the medium scale track, we
achieve a noteworthy 4.2% improvement over the baseline by simply ensembling
existing baselines with weak supervision.
( 2
min )
Most existing neural network-based approaches for solving stochastic optimal
control problems using the associated backward dynamic programming principle
rely on the ability to simulate the underlying state variables. However, in
some problems, this simulation is infeasible, leading to the discretization of
state variable space and the need to train one neural network for each data
point. This approach becomes computationally inefficient when dealing with
large state variable spaces. In this paper, we consider a class of this type of
stochastic optimal control problems and introduce an effective solution
employing multitask neural networks. To train our multitask neural network, we
introduce a novel scheme that dynamically balances the learning across tasks.
Through numerical experiments on real-world derivatives pricing problems, we
prove that our method outperforms state-of-the-art approaches.
( 2
min )
In this study, we introduce Orion-14B, a collection of multilingual large
language models with 14 billion parameters. We utilize a data scheduling
approach to train a foundational model on a diverse corpus of 2.5 trillion
tokens, sourced from texts in English, Chinese, Japanese, Korean, and other
languages. Additionally, we fine-tuned a series of models tailored for
conversational applications and other specific use cases. Our evaluation
results demonstrate that Orion-14B achieves state-of-the-art performance across
a broad spectrum of tasks. We make the Orion-14B model family and its
associated code publicly accessible https://github.com/OrionStarAI/Orion,
aiming to inspire future research and practical applications in the field.
( 2
min )
Most existing neural network-based approaches for solving stochastic optimal
control problems using the associated backward dynamic programming principle
rely on the ability to simulate the underlying state variables. However, in
some problems, this simulation is infeasible, leading to the discretization of
state variable space and the need to train one neural network for each data
point. This approach becomes computationally inefficient when dealing with
large state variable spaces. In this paper, we consider a class of this type of
stochastic optimal control problems and introduce an effective solution
employing multitask neural networks. To train our multitask neural network, we
introduce a novel scheme that dynamically balances the learning across tasks.
Through numerical experiments on real-world derivatives pricing problems, we
prove that our method outperforms state-of-the-art approaches.
( 2
min )
In this paper, we present a novel bilevel optimization-based training
approach to training acoustic models for automatic speech recognition (ASR)
tasks that we term {bi-level joint unsupervised and supervised training
(BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an
unsupervised loss and a supervised loss respectively, leveraging recent
advances in penalty-based bilevel optimization to solve this challenging ASR
problem with affordable complexity and rigorous convergence guarantees.} To
evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2
datasets have been conducted. BL-JUST achieves superior performance over the
commonly used pre-training followed by fine-tuning strategy.
( 2
min )
Transfer learning for nonparametric regression is considered. We first study
the non-asymptotic minimax risk for this problem and develop a novel estimator
called the confidence thresholding estimator, which is shown to achieve the
minimax optimal risk up to a logarithmic factor. Our results demonstrate two
unique phenomena in transfer learning: auto-smoothing and super-acceleration,
which differentiate it from nonparametric regression in a traditional setting.
We then propose a data-driven algorithm that adaptively achieves the minimax
risk up to a logarithmic factor across a wide range of parameter spaces.
Simulation studies are conducted to evaluate the numerical performance of the
adaptive transfer learning algorithm, and a real-world example is provided to
demonstrate the benefits of the proposed method.
( 2
min )
We develop a versatile framework for statistical learning in non-stationary
environments. In each time period, our approach applies a stability principle
to select a look-back window that maximizes the utilization of historical data
while keeping the cumulative bias within an acceptable range relative to the
stochastic error. Our theory showcases the adaptability of this approach to
unknown non-stationarity. The regret bound is minimax optimal up to logarithmic
factors when the population losses are strongly convex, or Lipschitz only. At
the heart of our analysis lie two novel components: a measure of similarity
between functions and a segmentation technique for dividing the non-stationary
data sequence into quasi-stationary pieces.
( 2
min )
This post discusses how enterprises can build accurate, transparent, and secure generative AI applications while keeping full control over proprietary data. The proposed solution is a RAG pipeline using an AI-native technology stack, whose components are designed from the ground up with AI at their core, rather than having AI capabilities added as an afterthought. We demonstrate how to build an end-to-end RAG application using Cohere’s language models through Amazon Bedrock and a Weaviate vector database on AWS Marketplace.
( 13
min )
In a major stride toward building a shared national research infrastructure, the U.S. National Science Foundation has launched the National Artificial Intelligence Research Resource pilot program with significant support from NVIDIA. The initiative aims to broaden access to the tools needed to power responsible AI discovery and innovation. It was announced Wednesday in partnership with Read article >
( 7
min )
RTX Video HDR — first announced at CES — is now available for download through the January Studio Driver.
( 8
min )
Interatomic potentials learned using machine learning methods have been
successfully applied to atomistic simulations. However, accurate models require
large training datasets, while generating reference calculations is
computationally demanding. To bypass this difficulty, we propose a transfer
learning algorithm that leverages the ability of graph neural networks (GNNs)
to represent chemical environments together with kernel mean embeddings. We
extract a feature map from GNNs pre-trained on the OC20 dataset and use it to
learn the potential energy surface from system-specific datasets of catalytic
processes. Our method is further enhanced by incorporating into the kernel the
chemical species information, resulting in improved performance and
interpretability. We test our approach on a series of realistic datasets of
increasing complexity, showing excellent generalization and transferability
performance, and improving on methods that rely on GNNs or ridge regression
alone, as well as similar fine-tuning approaches.
( 2
min )
Biomedical literature is growing rapidly, making it challenging to curate and
extract knowledge manually. Biomedical natural language processing (BioNLP)
techniques that can automatically extract information from biomedical
literature help alleviate this burden. Recently, large Language Models (LLMs),
such as GPT-3 and GPT-4, have gained significant attention for their impressive
performance. However, their effectiveness in BioNLP tasks and impact on method
development and downstream users remain understudied. This pilot study (1)
establishes the baseline performance of GPT-3 and GPT-4 at both zero-shot and
one-shot settings in eight BioNLP datasets across four applications: named
entity recognition, relation extraction, multi-label document classification,
and semantic similarity and reasoning, (2) examines the errors produced by the
LLMs and categorized the errors into three types: missingness, inconsistencies,
and unwanted artificial content, and (3) provides suggestions for using LLMs in
BioNLP applications. We make the datasets, baselines, and results publicly
available to the community via
https://github.com/qingyu-qc/gpt_bionlp_benchmark.
( 2
min )
Using a vocabulary that is shared across languages is common practice in
Multilingual Neural Machine Translation (MNMT). In addition to its simple
design, shared tokens play an important role in positive knowledge transfer,
assuming that shared tokens refer to similar meanings across languages.
However, when word overlap is small, especially due to different writing
systems, transfer is inhibited. In this paper, we define word-level information
transfer pathways via word equivalence classes and rely on graph networks to
fuse word embeddings across languages. Our experiments demonstrate the
advantages of our approach: 1) embeddings of words with similar meanings are
better aligned across languages, 2) our method achieves consistent BLEU
improvements of up to 2.3 points for high- and low-resource MNMT, and 3) less
than 1.0\% additional trainable parameters are required with a limited increase
in computational costs, while inference time remains identical to the baseline.
We release the codebase to the community.
( 2
min )
Current methods to identify and classify racist language in text rely on
small-n qualitative approaches or large-n approaches focusing exclusively on
overt forms of racist discourse. This article provides a step-by-step
generalizable guideline to identify and classify different forms of racist
discourse in large corpora. In our approach, we start by conceptualizing racism
and its different manifestations. We then contextualize these racist
manifestations to the time and place of interest, which allows researchers to
identify their discursive form. Finally, we apply XLM-RoBERTa (XLM-R), a
cross-lingual model for supervised text classification with a cutting-edge
contextual understanding of text. We show that XLM-R and XLM-R-Racismo, our
pretrained model, outperform other state-of-the-art approaches in classifying
racism in large corpora. We illustrate our approach using a corpus of tweets
relating to the Ecuadorian ind\'igena community between 2018 and 2021.
( 2
min )
Predicting crowded intents and trajectories is crucial in varouls real-world
applications, including service robots and autonomous vehicles. Understanding
environmental dynamics is challenging, not only due to the complexities of
modeling pair-wise spatial and temporal interactions but also the diverse
influence of group-wise interactions. To decode the comprehensive pair-wise and
group-wise interactions in crowded scenarios, we introduce Hyper-STTN, a
Hypergraph-based Spatial-Temporal Transformer Network for crowd trajectory
prediction. In Hyper-STTN, crowded group-wise correlations are constructed
using a set of multi-scale hypergraphs with varying group sizes, captured
through random-walk robability-based hypergraph spectral convolution.
Additionally, a spatial-temporal transformer is adapted to capture pedestrians'
pair-wise latent interactions in spatial-temporal dimensions. These
heterogeneous group-wise and pair-wise are then fused and aligned though a
multimodal transformer network. Hyper-STTN outperformes other state-of-the-art
baselines and ablation models on 5 real-world pedestrian motion datasets.
( 2
min )
Deep neural networks (DNNs) could be deceived by generating
human-imperceptible perturbations of clean samples. Therefore, enhancing the
robustness of DNNs against adversarial attacks is a crucial task. In this
paper, we aim to train robust DNNs by limiting the set of outputs reachable via
a norm-bounded perturbation added to a clean sample. We refer to this set as
adversarial polytope, and each clean sample has a respective adversarial
polytope. Indeed, if the respective polytopes for all the samples are compact
such that they do not intersect the decision boundaries of the DNN, then the
DNN is robust against adversarial samples. Hence, the inner-working of our
algorithm is based on learning \textbf{c}onfined \textbf{a}dversarial
\textbf{p}olytopes (CAP). By conducting a thorough set of experiments, we
demonstrate the effectiveness of CAP over existing adversarial robustness
methods in improving the robustness of models against state-of-the-art attacks
including AutoAttack.
( 2
min )
Strategies for partially observable Markov decision processes (POMDP)
typically require memory. One way to represent this memory is via automata. We
present a method to learn an automaton representation of a strategy using a
modification of the L*-algorithm. Compared to the tabular representation of a
strategy, the resulting automaton is dramatically smaller and thus also more
explainable. Moreover, in the learning process, our heuristics may even improve
the strategy's performance. In contrast to approaches that synthesize an
automaton directly from the POMDP thereby solving it, our approach is
incomparably more scalable.
( 2
min )
Stochastic differential equations (SDEs) have been widely used to model real
world random phenomena. Existing works mainly focus on the case where the time
series is modeled by a single SDE, which might be restrictive for modeling time
series with distributional shift. In this work, we propose a change point
detection algorithm for time series modeled as neural SDEs. Given a time series
dataset, the proposed method jointly learns the unknown change points and the
parameters of distinct neural SDE models corresponding to each change point.
Specifically, the SDEs are learned under the framework of generative
adversarial networks (GANs) and the change points are detected based on the
output of the GAN discriminator in a forward pass. At each step of the proposed
algorithm, the change points and the SDE model parameters are updated in an
alternating fashion. Numerical results on both synthetic and real datasets are
provided to validate the performance of our algorithm in comparison to
classical change point detection benchmarks, standard GAN-based neural SDEs,
and other state-of-the-art deep generative models for time series data.
( 2
min )
Time-series anomaly detection deals with the problem of detecting anomalous
timesteps by learning normality from the sequence of observations. However, the
concept of normality evolves over time, leading to a "new normal problem",
where the distribution of normality can be changed due to the distribution
shifts between training and test data. This paper highlights the prevalence of
the new normal problem in unsupervised time-series anomaly detection studies.
To tackle this issue, we propose a simple yet effective test-time adaptation
strategy based on trend estimation and a self-supervised approach to learning
new normalities during inference. Extensive experiments on real-world
benchmarks demonstrate that incorporating the proposed strategy into the
anomaly detector consistently improves the model's performance compared to the
baselines, leading to robustness to the distribution shifts.
( 2
min )
This paper introduces a novel approach for topic modeling utilizing latent
codebooks from Vector-Quantized Variational Auto-Encoder~(VQ-VAE), discretely
encapsulating the rich information of the pre-trained embeddings such as the
pre-trained language model. From the novel interpretation of the latent
codebooks and embeddings as conceptual bag-of-words, we propose a new
generative topic model called Topic-VQ-VAE~(TVQ-VAE) which inversely generates
the original documents related to the respective latent codebook. The TVQ-VAE
can visualize the topics with various generative distributions including the
traditional BoW distribution and the autoregressive image generation. Our
experimental results on document analysis and image generation demonstrate that
TVQ-VAE effectively captures the topic context which reveals the underlying
structures of the dataset and supports flexible forms of document generation.
Official implementation of the proposed TVQ-VAE is available at
https://github.com/clovaai/TVQ-VAE.
( 2
min )
Computational offloading has become an enabling component for edge
intelligence in mobile and smart devices. Existing offloading schemes mainly
focus on mobile devices and servers, while ignoring the potential network
congestion caused by tasks from multiple mobile devices, especially in wireless
multi-hop networks. To fill this gap, we propose a low-overhead,
congestion-aware distributed task offloading scheme by augmenting a distributed
greedy framework with graph-based machine learning. In simulated wireless
multi-hop networks with 20-110 nodes and a resource allocation scheme based on
shortest path routing and contention-based link scheduling, our approach is
demonstrated to be effective in reducing congestion or unstable queues under
the context-agnostic baseline, while improving the execution latency over local
computing.
( 2
min )
This paper revisits a class of convex Finite-Sum Coupled Compositional
Stochastic Optimization (cFCCO) problems with many applications, including
group distributionally robust optimization (GDRO), learning with imbalanced
data, reinforcement learning, and learning to rank. To better solve these
problems, we introduce an efficient single-loop primal-dual block-coordinate
proximal algorithm, dubbed ALEXR. This algorithm leverages block-coordinate
stochastic mirror ascent updates for the dual variable and stochastic proximal
gradient descent updates for the primal variable. We establish the convergence
rates of ALEXR in both convex and strongly convex cases under smoothness and
non-smoothness conditions of involved functions, which not only improve the
best rates in previous works on smooth cFCCO problems but also expand the realm
of cFCCO for solving more challenging non-smooth problems such as the dual form
of GDRO. Finally, we present lower complexity bounds to demonstrate that the
convergence rates of ALEXR are optimal among first-order block-coordinate
stochastic algorithms for the considered class of cFCCO problems.
( 2
min )
Spiking Neural Networks (SNNs) have gained considerable attention due to the
energy-efficient and multiplication-free characteristics. The continuous growth
in scale of deep SNNs poses challenges for model deployment. Network pruning
reduces hardware resource requirements of model deployment by compressing the
network scale. However, existing SNN pruning methods cause high pruning costs
and performance loss because the pruning iterations amplify the training
difficulty of SNNs. In this paper, inspired by the critical brain hypothesis in
neuroscience, we propose a regeneration mechanism based on the neuron
criticality for SNN pruning to enhance feature extraction and accelerate the
pruning process. Firstly, we propose a low-cost metric for the criticality in
SNNs. Then, we re-rank the pruned structures after pruning and regenerate those
with higher criticality to obtain the critical network. Our method achieves
higher performance than the current state-of-the-art (SOTA) method with up to
95.26% reduction of pruning cost. Moreover, we investigate the underlying
mechanism of our method and find that it efficiently selects potential
structures and learns the consistent feature representation.
( 2
min )
Using the atomic cluster expansion (ACE) framework, we develop a machine
learning interatomic potential for fast and accurately modelling the phonon
transport properties of wurtzite aluminum nitride. The predictive power of the
ACE potential against density functional theory (DFT) is demonstrated across a
broad range of properties of w-AlN, including ground-state lattice parameters,
specific heat capacity, coefficients of thermal expansion, bulk modulus, and
harmonic phonon dispersions. Validation of lattice thermal conductivity is
further carried out by comparing the ACE-predicted values to the DFT
calculations and experiments, exhibiting the overall capability of our ACE
potential in sufficiently describing anharmonic phonon interactions. As a
practical application, we perform a lattice dynamics analysis using the
potential to unravel the effects of biaxial strains on thermal conductivity and
phonon properties of w-AlN, which is identified as a significant tuning factor
for near-junction thermal design of w-AlN-based electronics.
( 2
min )
We sample from a given target distribution by constructing a neural network
which maps samples from a simple reference, e.g. the standard normal
distribution, to samples from the target. To that end, we propose using a
neural network architecture inspired by the Langevin Monte Carlo (LMC)
algorithm. Based on LMC perturbation results, we show approximation rates of
the proposed architecture for smooth, log-concave target distributions measured
in the Wasserstein-$2$ distance. The analysis heavily relies on the notion of
sub-Gaussianity of the intermediate measures of the perturbed LMC process. In
particular, we derive bounds on the growth of the intermediate variance proxies
under different assumptions on the perturbations. Moreover, we propose an
architecture similar to deep residual neural networks and derive expressivity
results for approximating the sample to target distribution map.
( 2
min )
In recent years, significant progress in generative AI has highlighted the
important role of physics-inspired models that utilize advanced mathematical
concepts based on fundamental physics principles to enhance artificial
intelligence capabilities. Among these models, those based on diffusion
equations have greatly improved image quality. This study aims to explore the
potential uses of Maxwell-Boltzmann equation, which forms the basis of the
kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix
Modelling (MMM) applications. We propose incorporating these equations into
Hierarchical Bayesian models to analyse consumer behaviour in the context of
advertising. These equation sets excel in accurately describing the random
dynamics in complex systems like social interactions and consumer-advertising
interactions.
( 2
min )
In this paper, we present conditions for identifying the generator of a
linear stochastic differential equation (SDE) from the distribution of its
solution process with a given fixed initial state. These identifiability
conditions are crucial in causal inference using linear SDEs as they enable the
identification of the post-intervention distributions from its observational
distribution. Specifically, we derive a sufficient and necessary condition for
identifying the generator of linear SDEs with additive noise, as well as a
sufficient condition for identifying the generator of linear SDEs with
multiplicative noise. We show that the conditions derived for both types of
SDEs are generic. Moreover, we offer geometric interpretations of the derived
identifiability conditions to enhance their understanding. To validate our
theoretical results, we perform a series of simulations, which support and
substantiate the established findings.
( 2
min )
We establish finite-sample guarantees for efficient proper learning of
bounded-degree polytrees, a rich class of high-dimensional probability
distributions and a subclass of Bayesian networks, a widely-studied type of
graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample
guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees.
We extend their results by providing an efficient algorithm which learns
$d$-polytrees in polynomial time and sample complexity for any bounded $d$ when
the underlying undirected graph (skeleton) is known. We complement our
algorithm with an information-theoretic sample complexity lower bound, showing
that the dependence on the dimension and target accuracy parameters are nearly
tight.
( 2
min )
This study preprocessed 2000-2019 energy consumption data for 46 key Sichuan
industries using matrix normalization. DBSCAN clustering identified 16 feature
classes to objectively group industries. Penalized regression models were then
applied for their advantages in overfitting control, high-dimensional data
processing, and feature selection - well-suited for the complex energy data.
Results showed the second cluster around coal had highest emissions due to
production needs. Emissions from gasoline-focused and coke-focused clusters
were also significant. Based on this, emission reduction suggestions included
clean coal technologies, transportation management, coal-electricity
replacement in steel, and industry standardization. The research introduced
unsupervised learning to objectively select factors and aimed to explore new
emission reduction avenues. In summary, the study identified industry
groupings, assessed emissions drivers, and proposed scientific reduction
strategies to better inform decision-making using algorithms like DBSCAN and
penalized regression models.
( 2
min )
Large language models (LLMs) have significantly improved the ability to
perform tasks in the field of code generation. However, there is still a gap
between LLMs being capable coders and being top-tier software engineers. Based
on the observation that toplevel software engineers often ask clarifying
questions to reduce ambiguity in both requirements and coding solutions, I
argue that the same should be applied to LLMs for code generation tasks. By
asking probing questions in various topics before generating the final code,
the challenges of programming with LLMs, such as unclear intent specification,
lack of computational thinking, and undesired code quality, may be alleviated.
This, in turn, increases confidence in the generated code. In this work, I
explore how to leverage better communication skills to achieve greater
confidence in generated code. I propose a communication-centered process that
uses an LLM-generated communicator to identify issues with high ambiguity or
low confidence in problem descriptions and generated code. I then ask
clarifying questions to obtain responses from users for refining the code.
( 3
min )
We study online multiclass classification under bandit feedback. We extend
the results of Daniely and Helbertal [2013] by showing that the finiteness of
the Bandit Littlestone dimension is necessary and sufficient for bandit online
learnability even when the label space is unbounded. Moreover, we show that,
unlike the full-information setting, sequential uniform convergence is
necessary but not sufficient for bandit online learnability. Our result
complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023]
who show that the Littlestone dimension characterizes online multiclass
learnability in the full-information setting even when the label space is
unbounded.
( 2
min )
This paper presents a deep reinforcement learning solution for optimizing
multi-UAV cell-association decisions and their moving velocity on a 3D aerial
highway. The objective is to enhance transportation and communication
performance, including collision avoidance, connectivity, and handovers. The
problem is formulated as a Markov decision process (MDP) with UAVs' states
defined by velocities and communication data rates. We propose a neural
architecture with a shared decision module and multiple network branches, each
dedicated to a specific action dimension in a 2D transportation-communication
space. This design efficiently handles the multi-dimensional action space,
allowing independence for individual action dimensions. We introduce two
models, Branching Dueling Q-Network (BDQ) and Branching Dueling Double Deep
Q-Network (Dueling DDQN), to demonstrate the approach. Simulation results show
a significant improvement of 18.32% compared to existing benchmarks.
( 2
min )
Deep Neural Networks (DNNs) have emerged as an effective approach to tackling
real-world problems. However, like human-written software, DNNs can have bugs
and can be attacked. To address this, research has explored a wide-range of
algorithmic approaches to verify DNN behavior. In this work, we introduce
NeuralSAT, a new verification approach that adapts the widely-used DPLL(T)
algorithm used in modern SMT solvers. A key feature of SMT solvers is the use
of conflict clause learning and search restart to scale verification. Unlike
prior DNN verification approaches, NeuralSAT combines an abstraction-based
deductive theory solver with clause learning and an evaluation clearly
demonstrates the benefits of the approach on a set of challenging verification
benchmarks.
( 2
min )
Bilevel programming has emerged as a valuable tool for hyperparameter
selection, a central concern in machine learning. In a recent study by Ye et
al. (2023), a value function-based difference of convex algorithm was
introduced to address bilevel programs. This approach proves particularly
powerful when dealing with scenarios where the lower-level problem exhibits
convexity in both the upper-level and lower-level variables. Examples of such
scenarios include support vector machines and $\ell_1$ and $\ell_2$ regularized
regression. In this paper, we significantly expand the range of applications,
now requiring convexity only in the lower-level variables of the lower-level
program. We present an innovative single-level difference of weakly convex
reformulation based on the Moreau envelope of the lower-level problem. We
further develop a sequentially convergent Inexact Proximal Difference of Weakly
Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed
iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for
kernel support vector machines on simulated data.
( 2
min )
Federated learning (FL) has garnered considerable attention due to its
privacy-preserving feature. Nonetheless, the lack of freedom in managing user
data can lead to group fairness issues, where models are biased towards
sensitive factors such as race or gender. To tackle this issue, this paper
proposes a novel algorithm, fair federated averaging with augmented Lagrangian
method (FFALM), designed explicitly to address group fairness issues in FL.
Specifically, we impose a fairness constraint on the training objective and
solve the minimax reformulation of the constrained optimization problem. Then,
we derive the theoretical upper bound for the convergence rate of FFALM. The
effectiveness of FFALM in improving fairness is shown empirically on CelebA and
UTKFace datasets in the presence of severe statistical heterogeneity.
( 2
min )
The posterior collapse phenomenon in variational autoencoder (VAE), where the
variational posterior distribution closely matches the prior distribution, can
hinder the quality of the learned latent variables. As a consequence of
posterior collapse, the latent variables extracted by the encoder in VAE
preserve less information from the input data and thus fail to produce
meaningful representations as input to the reconstruction process in the
decoder. While this phenomenon has been an actively addressed topic related to
VAE performance, the theory for posterior collapse remains underdeveloped,
especially beyond the standard VAE. In this work, we advance the theoretical
understanding of posterior collapse to two important and prevalent yet less
studied classes of VAE: conditional VAE and hierarchical VAE. Specifically, via
a non-trivial theoretical analysis of linear conditional VAE and hierarchical
VAE with two levels of latent, we prove that the cause of posterior collapses
in these models includes the correlation between the input and output of the
conditional VAE and the effect of learnable encoder variance in the
hierarchical VAE. We empirically validate our theoretical findings for linear
conditional and hierarchical VAE and demonstrate that these results are also
predictive for non-linear cases with extensive experiments.
( 3
min )
This document presents a stock market analysis conducted on a dataset
consisting of 750 instances and 16 attributes donated in 2014-10-23. The
analysis includes an exploratory data analysis (EDA) section, feature
engineering, data preparation, model selection, and insights from the analysis.
The Fama French 3-factor model is also utilized in the analysis. The results of
the analysis are presented, with linear regression being the best-performing
model.
( 2
min )
In the era of information proliferation, discerning the credibility of news
content poses an ever-growing challenge. This paper introduces RELIANCE, a
pioneering ensemble learning system designed for robust information and fake
news credibility evaluation. Comprising five diverse base models, including
Support Vector Machine (SVM), naive Bayes, logistic regression, random forest,
and Bidirectional Long Short Term Memory Networks (BiLSTMs), RELIANCE employs
an innovative approach to integrate their strengths, harnessing the collective
intelligence of the ensemble for enhanced accuracy. Experiments demonstrate the
superiority of RELIANCE over individual models, indicating its efficacy in
distinguishing between credible and non-credible information sources. RELIANCE,
also surpasses baseline models in information and news credibility assessment,
establishing itself as an effective solution for evaluating the reliability of
information sources.
( 2
min )
With the fast development of Deep Learning techniques, Named Entity
Recognition (NER) is becoming more and more important in the information
extraction task. The greatest difficulty that the NER task faces is to keep the
detectability even when types of NE and documents are unfamiliar. Realizing
that the specificity information may contain potential meanings of a word and
generate semantic-related features for word embedding, we develop a
distribution-aware word embedding and implement three different methods to make
use of the distribution information in a NER framework. And the result shows
that the performance of NER will be improved if the word specificity is
incorporated into existing NER methods.
( 2
min )
The question of what makes a data distribution suitable for deep learning is
a fundamental open problem. Focusing on locally connected neural networks (a
prevalent family of architectures that includes convolutional and recurrent
neural networks as well as local self-attention models), we address this
problem by adopting theoretical tools from quantum physics. Our main
theoretical result states that a certain locally connected neural network is
capable of accurate prediction over a data distribution if and only if the data
distribution admits low quantum entanglement under certain canonical partitions
of features. As a practical application of this result, we derive a
preprocessing method for enhancing the suitability of a data distribution to
locally connected neural networks. Experiments with widespread models over
various datasets demonstrate our findings. We hope that our use of quantum
entanglement will encourage further adoption of tools from physics for formally
reasoning about the relation between deep learning and real-world data.
( 3
min )
Sutton, Szepesv\'{a}ri and Maei introduced the first gradient
temporal-difference (GTD) learning algorithms compatible with both linear
function approximation and off-policy training. The goal of this paper is (a)
to propose some variants of GTDs with extensive comparative analysis and (b) to
establish new theoretical analysis frameworks for the GTDs. These variants are
based on convex-concave saddle-point interpretations of GTDs, which effectively
unify all the GTDs into a single framework, and provide simple stability
analysis based on recent results on primal-dual gradient dynamics. Finally,
numerical comparative analysis is given to evaluate these approaches.
( 2
min )
Continual learning aims to train a model incrementally on a sequence of tasks
without forgetting previous knowledge. Although continual learning has been
widely studied in computer vision, its application to Vision+Language tasks is
not that straightforward, as settings can be parameterized in multiple ways
according to their input modalities. In this paper, we present a detailed study
of how different settings affect performance for Visual Question Answering. We
first propose three plausible task formulations and demonstrate their impact on
the performance of continual learning algorithms. We break down several factors
of task similarity, showing that performance and sensitivity to task order
highly depend on the shift of the output distribution. We also investigate the
potential of pretrained models and compare the robustness of transformer models
with different visual embeddings. Finally, we provide an analysis interpreting
model representations and their impact on forgetting. Our results highlight the
importance of stabilizing visual representations in deeper layers.
( 2
min )
A code generation model generates code by taking a prompt from a code
comment, existing code, or a combination of both. Although code generation
models (e.g., GitHub Copilot) are increasingly being adopted in practice, it is
unclear whether they can successfully be used for unit test generation without
fine-tuning for a strongly typed language like Java. To fill this gap, we
investigated how well three models (Codex, GPT-3.5-Turbo, and StarCoder) can
generate unit tests. We used two benchmarks (HumanEval and Evosuite SF110) to
investigate the effect of context generation on the unit test generation
process. We evaluated the models based on compilation rates, test correctness,
test coverage, and test smells. We found that the Codex model achieved above
80% coverage for the HumanEval dataset, but no model had more than 2% coverage
for the EvoSuite SF110 benchmark. The generated tests also suffered from test
smells, such as Duplicated Asserts and Empty Tests.
( 2
min )
There has been considerable recent interest in estimating heterogeneous
causal effects. In this paper, we introduce conditional average partial causal
effects (CAPCE) to reveal the heterogeneity of causal effects with continuous
treatment. We provide conditions for identifying CAPCE in an instrumental
variable setting. We develop three families of CAPCE estimators: sieve,
parametric, and reproducing kernel Hilbert space (RKHS)-based, and analyze
their statistical properties. We illustrate the proposed CAPCE estimators on
synthetic and real-world data.
( 2
min )
Identifying important features linked to a response variable is a fundamental
task in various scientific domains. This article explores statistical inference
for simulated Markov random fields in high-dimensional settings. We introduce a
methodology based on Markov Chain Monte Carlo Maximum Likelihood Estimation
(MCMC-MLE) with Elastic-net regularization. Under mild conditions on the MCMC
method, our penalized MCMC-MLE method achieves $\ell_{1}$-consistency. We
propose a decorrelated score test, establishing both its asymptotic normality
and that of a one-step estimator, along with the associated confidence
interval. Furthermore, we construct two false discovery rate control procedures
via the asymptotic behaviors for both p-values and e-values. Comprehensive
numerical simulations confirm the theoretical validity of the proposed methods.
( 2
min )
In modern radar systems, precise target localization using azimuth and
velocity estimation is paramount. Traditional unbiased estimation methods have
leveraged gradient descent algorithms to reach the theoretical limits of the
Cram\'er Rao Bound (CRB) for the error of the parameter estimates. In this
study, we present a data-driven neural network approach that outperforms these
traditional techniques, demonstrating improved accuracies in target azimuth and
velocity estimation. Using a representative simulated scenario, we show that
our proposed neural network model consistently achieves improved parameter
estimates due to its inherently biased nature, yielding a diminished mean
squared error (MSE). Our findings underscore the potential of employing deep
learning methods in radar systems, paving the way for more accurate
localization in cluttered and dynamic environments.
( 2
min )
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a
parallelizable algorithm that estimates probability distributions using
functional analytic approach: first, it finds a smooth functional estimate of
the probability distribution, whether it is normalized or not; second, the
algorithm provides an estimate of the normalizing weight; and third, the
algorithm proposes a new computation scheme to compute such estimates.
A core component of OPAA is a special transform of the square root of the
joint distribution into a special functional space of our construct. Through
this transform, the evidence is equated with the $L^2$ norm of the transformed
function, squared. Hence, the evidence can be estimated by the sum of squares
of the transform coefficients. Computations can be parallelized and completed
in one pass.
OPAA can be applied broadly to the estimation of probability density
functions. In Bayesian problems, it can be applied to estimating the
normalizing weight of the posterior, which is also known as the evidence,
serving as an alternative to existing optimization-based methods.
( 2
min )
In non-asymptotic learning, variance-type parameters of sub-Gaussian
distributions are of paramount importance. However, directly estimating these
parameters using the empirical moment generating function (MGF) is infeasible.
To address this, we suggest using the sub-Gaussian intrinsic moment norm
[Buldygin and Kozachenko (2000), Theorem 1.3] achieved by maximizing a sequence
of normalized moments. Significantly, the suggested norm can not only
reconstruct the exponential moment bounds of MGFs but also provide tighter
sub-Gaussian concentration inequalities. In practice, we provide an intuitive
method for assessing whether data with a finite sample size is sub-Gaussian,
utilizing the sub-Gaussian plot. The intrinsic moment norm can be robustly
estimated via a simple plug-in approach. Our theoretical findings are also
applicable to reinforcement learning, including the multi-armed bandit
scenario.
( 2
min )
The manifold scattering transform is a deep feature extractor for data
defined on a Riemannian manifold. It is one of the first examples of extending
convolutional neural network-like operators to general manifolds. The initial
work on this model focused primarily on its theoretical stability and
invariance properties but did not provide methods for its numerical
implementation except in the case of two-dimensional surfaces with predefined
meshes. In this work, we present practical schemes, based on the theory of
diffusion maps, for implementing the manifold scattering transform to datasets
arising in naturalistic systems, such as single cell genetics, where the data
is a high-dimensional point cloud modeled as lying on a low-dimensional
manifold. We show that our methods are effective for signal classification and
manifold classification tasks.
( 2
min )
In recent times machine learning methods have made significant advances in
becoming a useful tool for analyzing physical systems. A particularly active
area in this theme has been "physics-informed machine learning" which focuses
on using neural nets for numerically solving differential equations. In this
work, we aim to advance the theory of measuring out-of-sample error while
training DeepONets -- which is among the most versatile ways to solve PDE
systems in one-shot.
Firstly, for a class of DeepONets, we prove a bound on their Rademacher
complexity which does not explicitly scale with the width of the nets involved.
Secondly, we use this to show how the Huber loss can be chosen so that for
these DeepONet classes generalization error bounds can be obtained that have no
explicit dependence on the size of the nets. We note that our theoretical
results apply to any PDE being targeted to be solved by DeepONets.
( 2
min )
Unsupervised video object learning seeks to decompose video scenes into
structural object representations without any supervision from depth, optical
flow, or segmentation. We present VONet, an innovative approach that is
inspired by MONet. While utilizing a U-Net architecture, VONet employs an
efficient and effective parallel attention inference process, generating
attention masks for all slots simultaneously. Additionally, to enhance the
temporal consistency of each mask across consecutive video frames, VONet
develops an object-wise sequential VAE framework. The integration of these
innovative encoder-side techniques, in conjunction with an expressive
transformer-based decoder, establishes VONet as the leading unsupervised method
for object learning across five MOVI datasets, encompassing videos of diverse
complexities. Code is available at https://github.com/hnyu/vonet.
( 2
min )
Document set expansion aims to identify relevant documents from a large
collection based on a small set of documents that are on a fine-grained topic.
Previous work shows that PU learning is a promising method for this task.
However, some serious issues remain unresolved, i.e. typical challenges that PU
methods suffer such as unknown class prior and imbalanced data, and the need
for transductive experimental settings. In this paper, we propose a novel PU
learning framework based on density estimation, called puDE, that can handle
the above issues. The advantage of puDE is that it neither constrained to the
SCAR assumption and nor require any class prior knowledge. We demonstrate the
effectiveness of the proposed method using a series of real-world datasets and
conclude that our method is a better alternative for the DSE task.
( 2
min )
Constructing first principles models is a challenging task for nonlinear and
complex systems such as a wastewater treatment unit. In recent years,
data-driven models are widely used to overcome the complexity. However, they
often suffer from issues such as missing, low quality or noisy data. Transfer
learning is a solution for this issue where knowledge from another task is
transferred to target one to increase the prediction performance. In this work,
the objective is increasing the prediction performance of an industrial
wastewater treatment plant by transferring the knowledge of (i) an open-source
simulation model that captures the underlying physics of the process, albeit
with dissimilarities to the target plant, (ii) another industrial plant
characterized by noisy and limited data but located in the same refinery, and
(iii) the model in (ii) and making the objective function of the training
problem physics informed where the physics information derived from the
open-source model in (ii). The results have shown that test and validation
performance are improved up to 27% and 59%, respectively.
( 2
min )
Ensemble defenses, are widely employed in various security-related
applications to enhance model performance and robustness. The widespread
adoption of these techniques also raises many questions: Are general ensembles
defenses guaranteed to be more robust than individuals? Will stronger adaptive
attacks defeat existing ensemble defense strategies as the cybersecurity arms
race progresses? Can ensemble defenses achieve adversarial robustness to
different types of attacks simultaneously and resist the continually adjusted
adaptive attacks? Unfortunately, these critical questions remain unresolved as
there are no platforms for comprehensive evaluation of ensemble adversarial
attacks and defenses in the cybersecurity domain. In this paper, we propose a
general Cybersecurity Adversarial Robustness Evaluation (CARE) platform aiming
to bridge this gap.
( 2
min )
Variational families with full-rank covariance approximations are known not
to work well in black-box variational inference (BBVI), both empirically and
theoretically. In fact, recent computational complexity results for BBVI have
established that full-rank variational families scale poorly with the
dimensionality of the problem compared to e.g. mean field families. This is
particularly critical to hierarchical Bayesian models with local variables;
their dimensionality increases with the size of the datasets. Consequently, one
gets an iteration complexity with an explicit $\mathcal{O}(N^2)$ dependence on
the dataset size $N$. In this paper, we explore a theoretical middle ground
between mean-field variational families and full-rank families: structured
variational families. We rigorously prove that certain scale matrix structures
can achieve a better iteration complexity of $\mathcal{O}(N)$, implying better
scaling with respect to $N$. We empirically verify our theoretical results on
large-scale hierarchical models.
( 2
min )
The applicability of widely adopted machine learning (ML) methods to
classification is circumscribed by the imperatives of explicability and
uncertainty, particularly evident in domains such as healthcare, behavioural
sciences, and finances, wherein accountability assumes priority. Recently,
Small and Incomplete Dataset Analyser (SaNDA) has been proposed to enhance the
ability to perform classification in such domains, by developing a data
abstraction protocol using a ROC curve-based method. This paper focuses on
column-wise data transformations called abstractions, which are crucial for
SaNDA's classification process and explores alternative abstractions protocols,
such as constant binning and quantiles. The best-performing methods have been
compared against Random Forest as a baseline for explainable methods. The
results suggests that SaNDA can be a viable substitute for Random Forest when
data is incomplete, even with minimal missing values. It consistently maintains
high accuracy even when half of the dataset is missing, unlike Random Forest
which experiences a significant decline in accuracy under similar conditions.
( 2
min )
Recently proposed methods for implicitly representing signals such as images,
scenes, or geometries using coordinate-based neural network architectures often
do not leverage the choice of activation functions, or do so only to a limited
extent. In this paper, we introduce the Hyperbolic Oscillation function (HOSC),
a novel activation function with a controllable sharpness parameter. Unlike any
previous activations, HOSC has been specifically designed to better capture
sudden changes in the input signal, and hence sharp or acute features of the
underlying data, as well as smooth low-frequency transitions. Due to its
simplicity and modularity, HOSC offers a plug-and-play functionality that can
be easily incorporated into any existing method employing a neural network as a
way of implicitly representing a signal. We benchmark HOSC against other
popular activations in an array of general tasks, empirically showing an
improvement in the quality of obtained representations, provide the
mathematical motivation behind the efficacy of HOSC, and discuss its
limitations.
( 2
min )
Graph Neural Networks (GNNs) have demonstrated remarkable success in modeling
complex relationships in graph-structured data. A recent innovation in this
field is the family of Differential Equation-Inspired Graph Neural Networks
(DE-GNNs), which leverage principles from continuous dynamical systems to model
information flow on graphs with built-in properties such as feature smoothing
or preservation. However, existing DE-GNNs rely on first or second-order
temporal dependencies. In this paper, we propose a neural extension to those
pre-defined temporal dependencies. We show that our model, called TDE-GNN, can
capture a wide range of temporal dynamics that go beyond typical first or
second-order methods, and provide use cases where existing temporal models are
challenged. We demonstrate the benefit of learning the temporal dependencies
using our method rather than using pre-defined temporal dynamics on several
graph benchmarks.
( 2
min )
In order to efficiently explore the chemical space of all possible small
molecules, a common approach is to compress the dimension of the system to
facilitate downstream machine learning tasks. Towards this end, we present a
data driven approach for clustering potential energy landscapes of molecular
structures by applying recently developed Network Embedding techniques, to
obtain latent variables defined through the embedding function. To scale up the
method, we also incorporate an entropy sensitive adaptive scheme for
hierarchical sampling of the energy landscape, based on Metadynamics and
Transition Path Theory. By taking into account the kinetic information implied
by a system's energy landscape, we are able to interpret dynamical node-node
relationships in reduced dimensions. We demonstrate the framework through
Lennard-Jones (LJ) clusters and a human DNA sequence.
( 2
min )
Speaker embeddings carry valuable emotion-related information, which makes
them a promising resource for enhancing speech emotion recognition (SER),
especially with limited labeled data. Traditionally, it has been assumed that
emotion information is indirectly embedded within speaker embeddings, leading
to their under-utilization. Our study reveals a direct and useful link between
emotion and state-of-the-art speaker embeddings in the form of intra-speaker
clusters. By conducting a thorough clustering analysis, we demonstrate that
emotion information can be readily extracted from speaker embeddings. In order
to leverage this information, we introduce a novel contrastive pretraining
approach applied to emotion-unlabeled data for speech emotion recognition. The
proposed approach involves the sampling of positive and the negative examples
based on the intra-speaker clusters of speaker embeddings. The proposed
strategy, which leverages extensive emotion-unlabeled data, leads to a
significant improvement in SER performance, whether employed as a standalone
pretraining task or integrated into a multi-task pretraining setting.
( 2
min )
We sample from a given target distribution by constructing a neural network
which maps samples from a simple reference, e.g. the standard normal
distribution, to samples from the target. To that end, we propose using a
neural network architecture inspired by the Langevin Monte Carlo (LMC)
algorithm. Based on LMC perturbation results, we show approximation rates of
the proposed architecture for smooth, log-concave target distributions measured
in the Wasserstein-$2$ distance. The analysis heavily relies on the notion of
sub-Gaussianity of the intermediate measures of the perturbed LMC process. In
particular, we derive bounds on the growth of the intermediate variance proxies
under different assumptions on the perturbations. Moreover, we propose an
architecture similar to deep residual neural networks and derive expressivity
results for approximating the sample to target distribution map.
( 2
min )
We establish finite-sample guarantees for efficient proper learning of
bounded-degree polytrees, a rich class of high-dimensional probability
distributions and a subclass of Bayesian networks, a widely-studied type of
graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample
guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees.
We extend their results by providing an efficient algorithm which learns
$d$-polytrees in polynomial time and sample complexity for any bounded $d$ when
the underlying undirected graph (skeleton) is known. We complement our
algorithm with an information-theoretic sample complexity lower bound, showing
that the dependence on the dimension and target accuracy parameters are nearly
tight.
( 2
min )
Stochastic differential equations (SDEs) have been widely used to model real
world random phenomena. Existing works mainly focus on the case where the time
series is modeled by a single SDE, which might be restrictive for modeling time
series with distributional shift. In this work, we propose a change point
detection algorithm for time series modeled as neural SDEs. Given a time series
dataset, the proposed method jointly learns the unknown change points and the
parameters of distinct neural SDE models corresponding to each change point.
Specifically, the SDEs are learned under the framework of generative
adversarial networks (GANs) and the change points are detected based on the
output of the GAN discriminator in a forward pass. At each step of the proposed
algorithm, the change points and the SDE model parameters are updated in an
alternating fashion. Numerical results on both synthetic and real datasets are
provided to validate the performance of our algorithm in comparison to
classical change point detection benchmarks, standard GAN-based neural SDEs,
and other state-of-the-art deep generative models for time series data.
( 2
min )
In this paper, we present conditions for identifying the generator of a
linear stochastic differential equation (SDE) from the distribution of its
solution process with a given fixed initial state. These identifiability
conditions are crucial in causal inference using linear SDEs as they enable the
identification of the post-intervention distributions from its observational
distribution. Specifically, we derive a sufficient and necessary condition for
identifying the generator of linear SDEs with additive noise, as well as a
sufficient condition for identifying the generator of linear SDEs with
multiplicative noise. We show that the conditions derived for both types of
SDEs are generic. Moreover, we offer geometric interpretations of the derived
identifiability conditions to enhance their understanding. To validate our
theoretical results, we perform a series of simulations, which support and
substantiate the established findings.
( 2
min )
In non-asymptotic learning, variance-type parameters of sub-Gaussian
distributions are of paramount importance. However, directly estimating these
parameters using the empirical moment generating function (MGF) is infeasible.
To address this, we suggest using the sub-Gaussian intrinsic moment norm
[Buldygin and Kozachenko (2000), Theorem 1.3] achieved by maximizing a sequence
of normalized moments. Significantly, the suggested norm can not only
reconstruct the exponential moment bounds of MGFs but also provide tighter
sub-Gaussian concentration inequalities. In practice, we provide an intuitive
method for assessing whether data with a finite sample size is sub-Gaussian,
utilizing the sub-Gaussian plot. The intrinsic moment norm can be robustly
estimated via a simple plug-in approach. Our theoretical findings are also
applicable to reinforcement learning, including the multi-armed bandit
scenario.
( 2
min )
The posterior collapse phenomenon in variational autoencoder (VAE), where the
variational posterior distribution closely matches the prior distribution, can
hinder the quality of the learned latent variables. As a consequence of
posterior collapse, the latent variables extracted by the encoder in VAE
preserve less information from the input data and thus fail to produce
meaningful representations as input to the reconstruction process in the
decoder. While this phenomenon has been an actively addressed topic related to
VAE performance, the theory for posterior collapse remains underdeveloped,
especially beyond the standard VAE. In this work, we advance the theoretical
understanding of posterior collapse to two important and prevalent yet less
studied classes of VAE: conditional VAE and hierarchical VAE. Specifically, via
a non-trivial theoretical analysis of linear conditional VAE and hierarchical
VAE with two levels of latent, we prove that the cause of posterior collapses
in these models includes the correlation between the input and output of the
conditional VAE and the effect of learnable encoder variance in the
hierarchical VAE. We empirically validate our theoretical findings for linear
conditional and hierarchical VAE and demonstrate that these results are also
predictive for non-linear cases with extensive experiments.
( 3
min )
We propose a method for estimation and inference for bounds for heterogeneous
causal effect parameters in general sample selection models where the treatment
can affect whether an outcome is observed and no exclusion restrictions are
available. The method provides conditional effect bounds as functions of policy
relevant pre-treatment variables. It allows for conducting valid statistical
inference on the unidentified conditional effects. We use a flexible
debiased/double machine learning approach that can accommodate non-linear
functional forms and high-dimensional confounders. Easily verifiable high-level
conditions for estimation, misspecification robust confidence intervals, and
uniform confidence bands are provided as well. We re-analyze data from a large
scale field experiment on Facebook on counter-attitudinal news subscription
with attrition. Our method yields substantially tighter effect bounds compared
to conventional methods and suggests depolarization effects for younger users.
( 2
min )
We study online multiclass classification under bandit feedback. We extend
the results of Daniely and Helbertal [2013] by showing that the finiteness of
the Bandit Littlestone dimension is necessary and sufficient for bandit online
learnability even when the label space is unbounded. Moreover, we show that,
unlike the full-information setting, sequential uniform convergence is
necessary but not sufficient for bandit online learnability. Our result
complements the recent work by Hanneke, Moran, Raman, Subedi, and Tewari [2023]
who show that the Littlestone dimension characterizes online multiclass
learnability in the full-information setting even when the label space is
unbounded.
( 2
min )
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a
parallelizable algorithm that estimates probability distributions using
functional analytic approach: first, it finds a smooth functional estimate of
the probability distribution, whether it is normalized or not; second, the
algorithm provides an estimate of the normalizing weight; and third, the
algorithm proposes a new computation scheme to compute such estimates.
A core component of OPAA is a special transform of the square root of the
joint distribution into a special functional space of our construct. Through
this transform, the evidence is equated with the $L^2$ norm of the transformed
function, squared. Hence, the evidence can be estimated by the sum of squares
of the transform coefficients. Computations can be parallelized and completed
in one pass.
OPAA can be applied broadly to the estimation of probability density
functions. In Bayesian problems, it can be applied to estimating the
normalizing weight of the posterior, which is also known as the evidence,
serving as an alternative to existing optimization-based methods.
( 2
min )
In recent times machine learning methods have made significant advances in
becoming a useful tool for analyzing physical systems. A particularly active
area in this theme has been "physics-informed machine learning" which focuses
on using neural nets for numerically solving differential equations. In this
work, we aim to advance the theory of measuring out-of-sample error while
training DeepONets -- which is among the most versatile ways to solve PDE
systems in one-shot.
Firstly, for a class of DeepONets, we prove a bound on their Rademacher
complexity which does not explicitly scale with the width of the nets involved.
Secondly, we use this to show how the Huber loss can be chosen so that for
these DeepONet classes generalization error bounds can be obtained that have no
explicit dependence on the size of the nets. We note that our theoretical
results apply to any PDE being targeted to be solved by DeepONets.
( 2
min )
Identifying important features linked to a response variable is a fundamental
task in various scientific domains. This article explores statistical inference
for simulated Markov random fields in high-dimensional settings. We introduce a
methodology based on Markov Chain Monte Carlo Maximum Likelihood Estimation
(MCMC-MLE) with Elastic-net regularization. Under mild conditions on the MCMC
method, our penalized MCMC-MLE method achieves $\ell_{1}$-consistency. We
propose a decorrelated score test, establishing both its asymptotic normality
and that of a one-step estimator, along with the associated confidence
interval. Furthermore, we construct two false discovery rate control procedures
via the asymptotic behaviors for both p-values and e-values. Comprehensive
numerical simulations confirm the theoretical validity of the proposed methods.
( 2
min )
The manifold scattering transform is a deep feature extractor for data
defined on a Riemannian manifold. It is one of the first examples of extending
convolutional neural network-like operators to general manifolds. The initial
work on this model focused primarily on its theoretical stability and
invariance properties but did not provide methods for its numerical
implementation except in the case of two-dimensional surfaces with predefined
meshes. In this work, we present practical schemes, based on the theory of
diffusion maps, for implementing the manifold scattering transform to datasets
arising in naturalistic systems, such as single cell genetics, where the data
is a high-dimensional point cloud modeled as lying on a low-dimensional
manifold. We show that our methods are effective for signal classification and
manifold classification tasks.
( 2
min )
We explore a stochastic contextual linear bandit problem where the agent
observes a noisy, corrupted version of the true context through a noise channel
with an unknown noise parameter. Our objective is to design an action policy
that can approximate" that of an oracle, which has access to the reward model,
the channel parameter, and the predictive distribution of the true context from
the observed noisy context. In a Bayesian framework, we introduce a Thompson
sampling algorithm for Gaussian bandits with Gaussian context noise. Adopting
an information-theoretic analysis, we demonstrate the Bayesian regret of our
algorithm concerning the oracle's action policy. We also extend this problem to
a scenario where the agent observes the true context with some delay after
receiving the reward and show that delayed true contexts lead to lower Bayesian
regret. Finally, we empirically demonstrate the performance of the proposed
algorithms against baselines.
( 2
min )
Approximate Thompson sampling with Langevin Monte Carlo broadens its reach
from Gaussian posterior sampling to encompass more general smooth posteriors.
However, it still encounters scalability issues in high-dimensional problems
when demanding high accuracy. To address this, we propose an approximate
Thompson sampling strategy, utilizing underdamped Langevin Monte Carlo, where
the latter is the go-to workhorse for simulations of high-dimensional
posteriors. Based on the standard smoothness and log-concavity conditions, we
study the accelerated posterior concentration and sampling using a specific
potential function. This design improves the sample complexity for realizing
logarithmic regrets from $\mathcal{\tilde O}(d)$ to $\mathcal{\tilde
O}(\sqrt{d})$. The scalability and robustness of our algorithm are also
empirically validated through synthetic experiments in high-dimensional bandit
problems.
( 2
min )
There has been considerable recent interest in estimating heterogeneous
causal effects. In this paper, we introduce conditional average partial causal
effects (CAPCE) to reveal the heterogeneity of causal effects with continuous
treatment. We provide conditions for identifying CAPCE in an instrumental
variable setting. We develop three families of CAPCE estimators: sieve,
parametric, and reproducing kernel Hilbert space (RKHS)-based, and analyze
their statistical properties. We illustrate the proposed CAPCE estimators on
synthetic and real-world data.
( 2
min )
Methods for estimating heterogeneous treatment effects (HTE) from
observational data have largely focused on continuous or binary outcomes, with
less attention paid to survival outcomes and almost none to settings with
competing risks. In this work, we develop censoring unbiased transformations
(CUTs) for survival outcomes both with and without competing risks.After
converting time-to-event outcomes using these CUTs, direct application of HTE
learners for continuous outcomes yields consistent estimates of heterogeneous
cumulative incidence effects, total effects, and separable direct effects. Our
CUTs enable application of a much larger set of state of the art HTE learners
for censored outcomes than had previously been available, especially in
competing risks settings. We provide generic model-free learner-specific oracle
inequalities bounding the finite-sample excess risk. The oracle efficiency
results depend on the oracle selector and estimated nuisance functions from all
steps involved in the transformation. We demonstrate the empirical performance
of the proposed methods in simulation studies.
( 2
min )
Variational families with full-rank covariance approximations are known not
to work well in black-box variational inference (BBVI), both empirically and
theoretically. In fact, recent computational complexity results for BBVI have
established that full-rank variational families scale poorly with the
dimensionality of the problem compared to e.g. mean field families. This is
particularly critical to hierarchical Bayesian models with local variables;
their dimensionality increases with the size of the datasets. Consequently, one
gets an iteration complexity with an explicit $\mathcal{O}(N^2)$ dependence on
the dataset size $N$. In this paper, we explore a theoretical middle ground
between mean-field variational families and full-rank families: structured
variational families. We rigorously prove that certain scale matrix structures
can achieve a better iteration complexity of $\mathcal{O}(N)$, implying better
scaling with respect to $N$. We empirically verify our theoretical results on
large-scale hierarchical models.
( 2
min )
Quantum machine learning, which involves running machine learning algorithms
on quantum devices, has garnered significant attention in both academic and
business circles. In this paper, we offer a comprehensive and unbiased review
of the various concepts that have emerged in the field of quantum machine
learning. This includes techniques used in Noisy Intermediate-Scale Quantum
(NISQ) technologies and approaches for algorithms compatible with
fault-tolerant quantum computing hardware. Our review covers fundamental
concepts, algorithms, and the statistical learning theory pertinent to quantum
machine learning.
( 2
min )
This paper addresses second-order stochastic optimization for estimating the
minimizer of a convex function written as an expectation. A direct recursive
estimation technique for the inverse Hessian matrix using a Robbins-Monro
procedure is introduced. This approach enables to drastically reduces
computational complexity. Above all, it allows to develop universal stochastic
Newton methods and investigate the asymptotic efficiency of the proposed
approach. This work so expands the application scope of secondorder algorithms
in stochastic optimization.
( 2
min )
Machine learning can be overwhelming with its variety of tasks. Most tasks can be solved with a few ML algorithms. You need to be aware of which algorithms to select, when to apply them, what parameters to take into consideration, and how to test them. This guide was crafted to provide you with a straightforward… Read More »Choosing the right machine learning algorithm for business success
The post Choosing the right machine learning algorithm for business success appeared first on Data Science Central.
( 23
min )
The automotive industry is being transformed by the integration of cutting-edge technologies into software-defined cars. At CES, NVIDIA invited industry leaders to share their perspectives on how technology, especially AI and computing power, is shaping the future of transportation. Watch the video to learn more from NVIDIA’s auto partners. Redefining Possibilities Through Partnership Magnus Ostberg, Read article >
( 6
min )
It’s hard to imagine an industry more competitive — or fast-paced — than online retail. Sellers need to create attractive and informative product listings that must be engaging, capture attention and generate trust. Amazon uses optimized containers on Amazon Elastic Compute Cloud (Amazon EC2) with NVIDIA Tensor Core GPUs to power a generative AI tool Read article >
( 5
min )
MetaOpt helps analyze, explain, and improve heuristic performance before deployment in production systems. Learn how it works, particularly in traffic engineering, packet scheduling, and VM placement.
The post MetaOpt: Examining, explaining, and improving heuristic performance appeared first on Microsoft Research.
( 10
min )
Mode collapse is a significant unsolved issue of generative adversarial
networks. In this work, we examine the causes of mode collapse from a novel
perspective. Due to the nonuniform sampling in the training process, some
sub-distributions may be missed when sampling data. As a result, even when the
generated distribution differs from the real one, the GAN objective can still
achieve the minimum. To address the issue, we propose a global distribution
fitting (GDF) method with a penalty term to confine the generated data
distribution. When the generated distribution differs from the real one, GDF
will make the objective harder to reach the minimal value, while the original
global minimum is not changed. To deal with the circumstance when the overall
real data is unreachable, we also propose a local distribution fitting (LDF)
method. Experiments on several benchmarks demonstrate the effectiveness and
competitive performance of GDF and LDF.
( 2
min )
Machine-learned normalizing flows can be used in the context of lattice
quantum field theory to generate statistically correlated ensembles of lattice
gauge fields at different action parameters. This work demonstrates how these
correlations can be exploited for variance reduction in the computation of
observables. Three different proof-of-concept applications are demonstrated
using a novel residual flow architecture: continuum limits of gauge theories,
the mass dependence of QCD observables, and hadronic matrix elements based on
the Feynman-Hellmann approach. In all three cases, it is shown that statistical
uncertainties are significantly reduced when machine-learned flows are
incorporated as compared with the same calculations performed with uncorrelated
ensembles or direct reweighting.
( 2
min )
RNA, whose functionality is largely determined by its structure, plays an
important role in many biological activities. The prediction of pairwise
structural proximity between each nucleotide of an RNA sequence can
characterize the structural information of the RNA. Historically, this problem
has been tackled by machine learning models using expert-engineered features
and trained on scarce labeled datasets. Here, we find that the knowledge
learned by a protein-coevolution Transformer-based deep neural network can be
transferred to the RNA contact prediction task. As protein datasets are orders
of magnitude larger than those for RNA contact prediction, our findings and the
subsequent framework greatly reduce the data scarcity bottleneck. Experiments
confirm that RNA contact prediction through transfer learning using a publicly
available protein model is greatly improved. Our findings indicate that the
learned structural patterns of proteins can be transferred to RNAs, opening up
potential new avenues for research.
( 2
min )
Recent years have seen a surge of interest in the algorithmic estimation of
stochastic entropy production (EP) from trajectory data via machine learning. A
crucial element of such algorithms is the identification of a loss function
whose minimization guarantees the accurate EP estimation. In this study, we
show that there exists a host of loss functions, namely those implementing a
variational representation of the $\alpha$-divergence, which can be used for
the EP estimation. By fixing $\alpha$ to a value between $-1$ and $0$, the
$\alpha$-NEEP (Neural Estimator for Entropy Production) exhibits a much more
robust performance against strong nonequilibrium driving or slow dynamics,
which adversely affects the existing method based on the Kullback-Leibler
divergence ($\alpha = 0$). In particular, the choice of $\alpha = -0.5$ tends
to yield the optimal results. To corroborate our findings, we present an
exactly solvable simplification of the EP estimation problem, whose loss
function landscape and stochastic properties give deeper intuition into the
robustness of the $\alpha$-NEEP.
( 2
min )
This paper presents a new type of hybrid model for Bayesian optimization (BO)
adept at managing mixed variables, encompassing both quantitative (continuous
and integer) and qualitative (categorical) types. Our proposed new hybrid
models (named hybridM) merge the Monte Carlo Tree Search structure (MCTS) for
categorical variables with Gaussian Processes (GP) for continuous ones. hybridM
leverages the upper confidence bound tree search (UCTS) for MCTS strategy,
showcasing the tree architecture's integration into Bayesian optimization. Our
innovations, including dynamic online kernel selection in the surrogate
modeling phase and a unique UCTS search strategy, position our hybrid models as
an advancement in mixed-variable surrogate models. Numerical experiments
underscore the superiority of hybrid models, highlighting their potential in
Bayesian optimization.
( 2
min )
Computational modeling of artwork meaning is complex and difficult. This is
because art interpretation is multidimensional and highly subjective. This
paper experimentally investigated the degree to which a state-of-the-art Deep
Convolutional Neural Network (DCNN), a popular Machine Learning approach, can
correctly distinguish modern conceptual art work into the galleries devised by
art curators. Two hypotheses were proposed to state that the DCNN model uses
Exhibited Properties for classification, like shape and color, but not
Non-Exhibited Properties, such as historical context and artist intention. The
two hypotheses were experimentally validated using a methodology designed for
this purpose. VGG-11 DCNN pre-trained on ImageNet dataset and discriminatively
fine-tuned was trained on handcrafted datasets designed from real-world
conceptual photography galleries. Experimental results supported the two
hypotheses showing that the DCNN model ignores Non-Exhibited Properties and
uses only Exhibited Properties for artwork classification. This work points to
current DCNN limitations, which should be addressed by future DNN models.
( 2
min )
Test log-likelihood is commonly used to compare different models of the same
data or different approximate inference algorithms for fitting the same
probabilistic model. We present simple examples demonstrating how comparisons
based on test log-likelihood can contradict comparisons according to other
objectives. Specifically, our examples show that (i) approximate Bayesian
inference algorithms that attain higher test log-likelihoods need not also
yield more accurate posterior approximations and (ii) conclusions about
forecast accuracy based on test log-likelihood comparisons may not agree with
conclusions based on root mean squared error.
( 2
min )
In data-driven control and machine learning, a common requirement involves
breaking down large matrices into smaller, low-rank factors that possess
specific levels of sparsity. This paper introduces an innovative solution to
the orthogonal nonnegative matrix factorization (ONMF) problem. The objective
is to approximate input data by using two low-rank nonnegative matrices,
adhering to both orthogonality and $\ell_0$-norm sparsity constraints. the
proposed maximum-entropy-principle based framework ensures orthogonality and
sparsity of features or the mixing matrix, while maintaining nonnegativity in
both. Additionally, the methodology offers a quantitative determination of the
``true'' number of underlying features, a crucial hyperparameter for ONMF.
Experimental evaluation on synthetic and a standard datasets highlights the
method's superiority in terms of sparsity, orthogonality, and computational
speed compared to existing approaches. Notably, the proposed method achieves
comparable or improved reconstruction errors in line with the literature.
( 2
min )
Multimodal sentiment analysis aims to identify the emotions expressed by
individuals through visual, language, and acoustic cues. However, most of the
existing research efforts assume that all modalities are available during both
training and testing, making their algorithms susceptible to the missing
modality scenario. In this paper, we propose a novel knowledge-transfer network
to translate between different modalities to reconstruct the missing audio
modalities. Moreover, we develop a cross-modality attention mechanism to retain
the maximal information of the reconstructed and observed modalities for
sentiment prediction. Extensive experiments on three publicly available
datasets demonstrate significant improvements over baselines and achieve
comparable results to the previous methods with complete multi-modality
supervision.
( 2
min )
Given the success of ChatGPT, LaMDA and other large language models (LLMs),
there has been an increase in development and usage of LLMs within the
technology sector and other sectors. While the level in which LLMs has not
reached a level where it has surpassed human intelligence, there will be a time
when it will. Such LLMs can be referred to as advanced LLMs. Currently, there
are limited usage of ethical artificial intelligence (AI) principles and
guidelines addressing advanced LLMs due to the fact that we have not reached
that point yet. However, this is a problem as once we do reach that point, we
will not be adequately prepared to deal with the aftermath of it in an ethical
and optimal way, which will lead to undesired and unexpected consequences. This
paper addresses this issue by discussing what ethical AI principles and
guidelines can be used to address highly advanced LLMs.
( 2
min )
Computational efficiency and adversarial robustness are critical factors in
real-world engineering applications. Yet, conventional neural networks often
fall short in addressing both simultaneously, or even separately. Drawing
insights from natural physical systems and existing literature, it is known
that an input convex architecture enhances computational efficiency, while a
Lipschitz-constrained architecture bolsters adversarial robustness. By
leveraging the strengths of convexity and Lipschitz continuity, we develop a
novel network architecture, termed Input Convex Lipschitz Recurrent Neural
Networks. This model outperforms existing recurrent units across a spectrum of
engineering tasks in terms of computational efficiency and adversarial
robustness. These tasks encompass a benchmark MNIST image classification,
real-world solar irradiance prediction for Solar PV system planning at LHT
Holdings in Singapore, and real-time Model Predictive Control optimization for
a chemical reactor.
( 2
min )
Utilizing task-invariant prior knowledge extracted from related tasks,
meta-learning is a principled framework that empowers learning a new task
especially when data records are limited. A fundamental challenge in
meta-learning is how to quickly "adapt" the extracted prior in order to train a
task-specific model within a few optimization steps. Existing approaches deal
with this challenge using a preconditioner that enhances convergence of the
per-task training process. Though effective in representing locally a quadratic
training loss, these simple linear preconditioners can hardly capture complex
loss geometries. The present contribution addresses this limitation by learning
a nonlinear mirror map, which induces a versatile distance metric to enable
capturing and optimizing a wide range of loss geometries, hence facilitating
the per-task training. Numerical tests on few-shot learning datasets
demonstrate the superior expressiveness and convergence of the advocated
approach.
( 2
min )
Image registration has traditionally been done using two distinct approaches:
learning based methods, relying on robust deep neural networks, and
optimization-based methods, applying complex mathematical transformations to
warp images accordingly. Of course, both paradigms offer advantages and
disadvantages, and, in this work, we seek to combine their respective strengths
into a single streamlined framework, using the outputs of the learning based
method as initial parameters for optimization while prioritizing computational
power for the image pairs that offer the greatest loss. Our investigations
showed improvements of up to 1.6% in test data, while maintaining the same
inference time, and a substantial 1.0% points performance gain in deformation
field smoothness.
( 2
min )
Transformer-based models excel in speech recognition. Existing efforts to
optimize Transformer inference, typically for long-context applications, center
on simplifying attention score calculations. However, streaming speech
recognition models usually process a limited number of tokens each time, making
attention score calculation less of a bottleneck. Instead, the bottleneck lies
in the linear projection layers of multi-head attention and feedforward
networks, constituting a substantial portion of the model size and contributing
significantly to computation, memory, and power usage.
To address this bottleneck, we propose folding attention, a technique
targeting these linear layers, significantly reducing model size and improving
memory and power efficiency. Experiments on on-device Transformer-based
streaming speech recognition models show that folding attention reduces model
size (and corresponding memory consumption) by up to 24% and power consumption
by up to 23%, all without compromising model accuracy or computation overhead.
( 2
min )
Stochastic generators are useful for estimating climate impacts on various
sectors. Projecting climate risk in various sectors, e.g. energy systems,
requires generators that are accurate (statistical resemblance to
ground-truth), reliable (do not produce erroneous examples), and efficient.
Leveraging data from the North American Land Data Assimilation System, we
introduce TemperatureGAN, a Generative Adversarial Network conditioned on
months, locations, and time periods, to generate 2m above ground atmospheric
temperatures at an hourly resolution. We propose evaluation methods and metrics
to measure the quality of generated samples. We show that TemperatureGAN
produces high-fidelity examples with good spatial representation and temporal
dynamics consistent with known diurnal cycles.
( 2
min )
We use explainable neural networks to connect the evolutionary history of
dark matter halos with their density profiles. The network captures independent
factors of variation in the density profiles within a low-dimensional
representation, which we physically interpret using mutual information. Without
any prior knowledge of the halos' evolution, the network recovers the known
relation between the early time assembly and the inner profile, and discovers
that the profile beyond the virial radius is described by a single parameter
capturing the most recent mass accretion rate. The results illustrate the
potential for machine-assisted scientific discovery in complicated
astrophysical datasets.
( 2
min )
Pseudorange errors are the root cause of localization inaccuracy in GPS.
Previous data-driven methods regress and eliminate pseudorange errors using
handcrafted intermediate labels. Unlike them, we propose an end-to-end GPS
localization framework, E2E-PrNet, to train a neural network for pseudorange
correction (PrNet) directly using the final task loss calculated with the
ground truth of GPS receiver states. The gradients of the loss with respect to
learnable parameters are backpropagated through a differentiable nonlinear
least squares optimizer to PrNet. The feasibility is verified with GPS data
collected by Android phones, showing that E2E-PrNet outperforms the
state-of-the-art end-to-end GPS localization methods.
( 2
min )
While colonization has sociohistorically impacted people's identities across
various dimensions, those colonial values and biases continue to be perpetuated
by sociotechnical systems. One category of sociotechnical systems--sentiment
analysis tools--can also perpetuate colonial values and bias, yet less
attention has been paid to how such tools may be complicit in perpetuating
coloniality, although they are often used to guide various practices (e.g.,
content moderation). In this paper, we explore potential bias in sentiment
analysis tools in the context of Bengali communities that have experienced and
continue to experience the impacts of colonialism. Drawing on identity
categories most impacted by colonialism amongst local Bengali communities, we
focused our analytic attention on gender, religion, and nationality. We
conducted an algorithmic audit of all sentiment analysis tools for Bengali,
available on the Python package index (PyPI) and GitHub. Despite similar
semantic content and structure, our analyses showed that in addition to
inconsistencies in output from different tools, Bengali sentiment analysis
tools exhibit bias between different identity categories and respond
differently to different ways of identity expression. Connecting our findings
with colonially shaped sociocultural structures of Bengali communities, we
discuss the implications of downstream bias of sentiment analysis tools.
( 3
min )
This paper investigates the double descent phenomenon in two-layer neural
networks, focusing on the role of L1 regularization and representation
dimensions. It explores an alternative double descent phenomenon, named sparse
double descent. The study emphasizes the complex relationship between model
complexity, sparsity, and generalization, and suggests further research into
more diverse models and datasets. The findings contribute to a deeper
understanding of neural network training and optimization.
( 2
min )
Black-box query-based attacks constitute significant threats to Machine
Learning as a Service (MLaaS) systems since they can generate adversarial
examples without accessing the target model's architecture and parameters.
Traditional defense mechanisms, such as adversarial training, gradient masking,
and input transformations, either impose substantial computational costs or
compromise the test accuracy of non-adversarial inputs. To address these
challenges, we propose an efficient defense mechanism, PuriDefense, that
employs random patch-wise purifications with an ensemble of lightweight
purification models at a low level of inference cost. These models leverage the
local implicit function and rebuild the natural image manifold. Our theoretical
analysis suggests that this approach slows down the convergence of query-based
attacks by incorporating randomness into purifications. Extensive experiments
on CIFAR-10 and ImageNet validate the effectiveness of our proposed
purifier-based defense mechanism, demonstrating significant improvements in
robustness against query-based attacks.
( 2
min )
With the steady rise of the use of AI in bio-technical applications and the
widespread adoption of genomics sequencing, an increasing amount of AI-based
algorithms and tools is entering the research and production stage affecting
critical decision-making streams like drug discovery and clinical outcomes.
This paper demonstrates the vulnerability of AI models often utilized
downstream tasks on recognized public genomics datasets. We undermine model
robustness by deploying an attack that focuses on input transformation while
mimicking the real data and confusing the model decision-making, ultimately
yielding a pronounced deterioration in model performance. Further, we enhance
our approach by generating poisoned data using a variational autoencoder-based
model. Our empirical findings unequivocally demonstrate a decline in model
performance, underscored by diminished accuracy and an upswing in false
positives and false negatives. Furthermore, we analyze the resulting
adversarial samples via spectral analysis yielding conclusions for
countermeasures against such attacks.
( 2
min )
Fair machine learning aims to prevent discrimination against individuals or
sub-populations based on sensitive attributes such as gender and race. In
recent years, causal inference methods have been increasingly used in fair
machine learning to measure unfairness by causal effects. However, current
methods assume that the true causal graph is given, which is often not true in
real-world applications. To address this limitation, this paper proposes a
framework for achieving causal fairness based on the notion of interventions
when the true causal graph is partially known. The proposed approach involves
modeling fair prediction using a Partially Directed Acyclic Graph (PDAG),
specifically, a class of causal DAGs that can be learned from observational
data combined with domain knowledge. The PDAG is used to measure causal
fairness, and a constrained optimization problem is formulated to balance
between fairness and accuracy. Results on both simulated and real-world
datasets demonstrate the effectiveness of this method.
( 2
min )
Neural network with quadratic decision functions have been introduced as
alternatives to standard neural networks with affine linear one. They are
advantageous when the objects to be identified are of compact basic geometries
like circles, ellipsis etc. In this paper we investigate the use of such ansatz
functions for classification. In particular we test and compare the algorithm
on the MNIST dataset for classification of handwritten digits and for
classification of subspecies. We also show, that the implementation can be
based on the neural network structure in the software Tensorflow and Keras,
respectively.
( 2
min )
This paper discusses the feasibility of using Large Language Models LLM for
code generation with a particular application in designing an RISC. The paper
also reviews the associated steps such as parsing, tokenization, encoding,
attention mechanism, sampling the tokens and iterations during code generation.
The generated code for the RISC components is verified through testbenches and
hardware implementation on a FPGA board. Four metric parameters Correct output
on the first iteration, Number of errors embedded in the code, Number of trials
required to achieve the code and Failure to generate the code after three
iterations, are used to compare the efficiency of using LLM in programming. In
all the cases, the generated code had significant errors and human intervention
was always required to fix the bugs. LLM can therefore be used to complement a
programmer code design.
( 2
min )
The use of low-rank adaptation (LoRA) with frozen pretrained language models
(PLMs) has become increasing popular as a mainstream, resource-efficient
modeling approach for memory-constrained hardware. In this study, we first
explore how to enhance model performance by introducing various LoRA training
strategies, achieving relative word error rate reductions of 3.50\% on the
public Librispeech dataset and of 3.67\% on an internal dataset in the
messaging domain. To further characterize the stability of LoRA-based
second-pass speech recognition models, we examine robustness against input
perturbations. These perturbations are rooted in homophone replacements and a
novel metric called N-best Perturbation-based Rescoring Robustness (NPRR), both
designed to measure the relative degradation in the performance of rescoring
models. Our experimental results indicate that while advanced variants of LoRA,
such as dynamic rank-allocated LoRA, lead to performance degradation in
$1$-best perturbation, they alleviate the degradation in $N$-best perturbation.
This finding is in comparison to fully-tuned models and vanilla LoRA tuning
baselines, suggesting that a comprehensive selection is needed when using
LoRA-based adaptation for compute-cost savings and robust language modeling.
( 3
min )
Deep learning still has drawbacks in terms of trustworthiness, which
describes a comprehensible, fair, safe, and reliable method. To mitigate the
potential risk of AI, clear obligations associated to trustworthiness have been
proposed via regulatory guidelines, e.g., in the European AI Act. Therefore, a
central question is to what extent trustworthy deep learning can be realized.
Establishing the described properties constituting trustworthiness requires
that the factors influencing an algorithmic computation can be retraced, i.e.,
the algorithmic implementation is transparent. Motivated by the observation
that the current evolution of deep learning models necessitates a change in
computing technology, we derive a mathematical framework which enables us to
analyze whether a transparent implementation in a computing model is feasible.
We exemplarily apply our trustworthiness framework to analyze deep learning
approaches for inverse problems in digital and analog computing models
represented by Turing and Blum-Shub-Smale Machines, respectively. Based on
previous results, we find that Blum-Shub-Smale Machines have the potential to
establish trustworthy solvers for inverse problems under fairly general
conditions, whereas Turing machines cannot guarantee trustworthiness to the
same degree.
( 2
min )
In this paper, we formulate the multi-agent graph bandit problem as a
multi-agent extension of the graph bandit problem introduced by Zhang,
Johansson, and Li [CISS 57, 1-6 (2023)]. In our formulation, $N$ cooperative
agents travel on a connected graph $G$ with $K$ nodes. Upon arrival at each
node, agents observe a random reward drawn from a node-dependent probability
distribution. The reward of the system is modeled as a weighted sum of the
rewards the agents observe, where the weights capture the decreasing marginal
reward associated with multiple agents sampling the same node at the same time.
We propose an Upper Confidence Bound (UCB)-based learning algorithm,
Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by
$O(N\log(T)[\sqrt{KT} + DK])$, where $D$ is the diameter of graph $G$. Lastly,
we numerically test our algorithm by comparing it to alternative methods.
( 2
min )
Geometric quantum machine learning based on equivariant quantum neural
networks (EQNN) recently appeared as a promising direction in quantum machine
learning. Despite the encouraging progress, the studies are still limited to
theory, and the role of hardware noise in EQNN training has never been
explored. This work studies the behavior of EQNN models in the presence of
noise. We show that certain EQNN models can preserve equivariance under Pauli
channels, while this is not possible under the amplitude damping channel. We
claim that the symmetry breaking grows linearly in the number of layers and
noise strength. We support our claims with numerical data from simulations as
well as hardware up to 64 qubits. Furthermore, we provide strategies to enhance
the symmetry protection of EQNN models in the presence of noise.
( 2
min )
The lack of anomaly detection methods during mechanized tunnelling can cause
financial loss and deficits in drilling time. On-site excavation requires hard
obstacles to be recognized prior to drilling in order to avoid damaging the
tunnel boring machine and to adjust the propagation velocity. The efficiency of
the structural anomaly detection can be increased with intelligent optimization
techniques and machine learning. In this research, the anomaly in a simple
structure is detected by comparing the experimental measurements of the
structural vibrations with numerical simulations using parameter estimation
methods.
( 2
min )
Adversarial attacks on learning-based trajectory predictors have already been
demonstrated. However, there are still open questions about the effects of
perturbations on trajectory predictor inputs other than state histories, and
how these attacks impact downstream planning and control. In this paper, we
conduct a sensitivity analysis on two trajectory prediction models,
Trajectron++ and AgentFormer. We observe that between all inputs, almost all of
the perturbation sensitivities for Trajectron++ lie only within the most recent
state history time point, while perturbation sensitivities for AgentFormer are
spread across state histories over time. We additionally demonstrate that,
despite dominant sensitivity on state history perturbations, an undetectable
image map perturbation made with the Fast Gradient Sign Method can induce large
prediction error increases in both models. Even though image maps may
contribute slightly to the prediction output of both models, this result
reveals that rather than being robust to adversarial image perturbations,
trajectory predictors are susceptible to image attacks. Using an
optimization-based planner and example perturbations crafted from sensitivity
results, we show how this vulnerability can cause a vehicle to come to a sudden
stop from moderate driving speeds.
( 2
min )
We introduce a cryptographic method to hide an arbitrary secret payload in
the response of a Large Language Model (LLM). A secret key is required to
extract the payload from the model's response, and without the key it is
provably impossible to distinguish between the responses of the original LLM
and the LLM that hides a payload. In particular, the quality of generated text
is not affected by the payload. Our approach extends a recent result of Christ,
Gunn and Zamir (2023) who introduced an undetectable watermarking scheme for
LLMs.
( 2
min )
Tactics, Techniques and Procedures (TTPs) represent sophisticated attack
patterns in the cybersecurity domain, described encyclopedically in textual
knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP
mapping, is an important and challenging task. Conventional learning approaches
often target the problem in the classical multi-class or multilabel
classification setting. This setting hinders the learning ability of the model
due to a large number of classes (i.e., TTPs), the inevitable skewness of the
label distribution and the complex hierarchical structure of the label space.
We formulate the problem in a different learning paradigm, where the assignment
of a text to a TTP label is decided by the direct semantic similarity between
the two, thus reducing the complexity of competing solely over the large
labeling space. To that end, we propose a neural matching architecture with an
effective sampling-based learn-to-compare mechanism, facilitating the learning
process of the matching model despite constrained resources.
( 2
min )
Malicious adversaries can attack machine learning models to infer sensitive
information or damage the system by launching a series of evasion attacks.
Although various work addresses privacy and security concerns, they focus on
individual defenses, but in practice, models may undergo simultaneous attacks.
This study explores the combination of adversarial training and differentially
private training to defend against simultaneous attacks. While
differentially-private adversarial training, as presented in DP-Adv,
outperforms the other state-of-the-art methods in performance, it lacks formal
privacy guarantees and empirical validation. Thus, in this work, we benchmark
the performance of this technique using a membership inference attack and
empirically show that the resulting approach is as private as non-robust
private models. This work also highlights the need to explore privacy
guarantees in dynamic training paradigms.
( 2
min )
We give a procedure for computing group-level $(\epsilon, \delta)$-DP
guarantees for DP-SGD, when using Poisson sampling or fixed batch size
sampling. Up to discretization errors in the implementation, the DP guarantees
computed by this procedure are tight (assuming we release every intermediate
iterate).
( 2
min )
Neural networks have been employed for a wide range of processing
applications like image processing, motor control, object detection and many
others. Living neural networks offer advantages of lower power consumption,
faster processing, and biological realism. Optogenetics offers high spatial and
temporal control over biological neurons and presents potential in training
live neural networks. This work proposes a simulated living neural network
trained indirectly by backpropagating STDP based algorithms using precision
activation by optogenetics achieving accuracy comparable to traditional neural
network training algorithms.
( 2
min )
Finding accurate solutions to the electronic Schr\"odinger equation plays an
important role in discovering important molecular and material energies and
characteristics. Consequently, solving systems with large numbers of electrons
has become increasingly important. Variational Monte Carlo (VMC) methods,
especially those approximated through deep neural networks, are promising in
this regard. In this paper, we aim to integrate one such model called the
FermiNet, a post-Hartree-Fock (HF) Deep Neural Network (DNN) model, into a
standard and widely used open source library, DeepChem. We also propose novel
initialization techniques to overcome the difficulties associated with the
assignment of excess or lack of electrons for ions.
( 2
min )
In this paper, we develop a deep learning-based bandwidth allocation policy
that is: 1) scalable with the number of users and 2) transferable to different
communication scenarios, such as non-stationary wireless channels, different
quality-of-service (QoS) requirements, and dynamically available resources. To
support scalability, the bandwidth allocation policy is represented by a graph
neural network (GNN), with which the number of training parameters does not
change with the number of users. To enable the generalization of the GNN, we
develop a hybrid-task meta-learning (HML) algorithm that trains the initial
parameters of the GNN with different communication scenarios during
meta-training. Next, during meta-testing, a few samples are used to fine-tune
the GNN with unseen communication scenarios. Simulation results demonstrate
that our HML approach can improve the initial performance by $8.79\%$, and
sampling efficiency by $73\%$, compared with existing benchmarks. After
fine-tuning, our near-optimal GNN-based policy can achieve close to the same
reward with much lower inference complexity compared to the optimal policy
obtained using iterative optimization.
( 2
min )
Condition monitoring plays a significant role in the safety and reliability
of modern industrial systems. Artificial intelligence (AI) approaches are
gaining attention from academia and industry as a growing subject in industrial
applications and as a powerful way of identifying faults. This paper provides
an overview of intelligent condition monitoring and fault detection and
diagnosis methods for industrial plants with a focus on the open-source
benchmark Tennessee Eastman Process (TEP). In this survey, the most popular and
state-of-the-art deep learning (DL) and machine learning (ML) algorithms for
industrial plant condition monitoring, fault detection, and diagnosis are
summarized and the advantages and disadvantages of each algorithm are studied.
Challenges like imbalanced data, unlabelled samples and how deep learning
models can handle them are also covered. Finally, a comparison of the
accuracies and specifications of different algorithms utilizing the Tennessee
Eastman Process (TEP) is conducted. This research will be beneficial for both
researchers who are new to the field and experts, as it covers the literature
on condition monitoring and state-of-the-art methods alongside the challenges
and possible solutions to them.
( 2
min )
Stochastic generators are useful for estimating climate impacts on various
sectors. Projecting climate risk in various sectors, e.g. energy systems,
requires generators that are accurate (statistical resemblance to
ground-truth), reliable (do not produce erroneous examples), and efficient.
Leveraging data from the North American Land Data Assimilation System, we
introduce TemperatureGAN, a Generative Adversarial Network conditioned on
months, locations, and time periods, to generate 2m above ground atmospheric
temperatures at an hourly resolution. We propose evaluation methods and metrics
to measure the quality of generated samples. We show that TemperatureGAN
produces high-fidelity examples with good spatial representation and temporal
dynamics consistent with known diurnal cycles.
( 2
min )
Recent years have seen a surge of interest in the algorithmic estimation of
stochastic entropy production (EP) from trajectory data via machine learning. A
crucial element of such algorithms is the identification of a loss function
whose minimization guarantees the accurate EP estimation. In this study, we
show that there exists a host of loss functions, namely those implementing a
variational representation of the $\alpha$-divergence, which can be used for
the EP estimation. By fixing $\alpha$ to a value between $-1$ and $0$, the
$\alpha$-NEEP (Neural Estimator for Entropy Production) exhibits a much more
robust performance against strong nonequilibrium driving or slow dynamics,
which adversely affects the existing method based on the Kullback-Leibler
divergence ($\alpha = 0$). In particular, the choice of $\alpha = -0.5$ tends
to yield the optimal results. To corroborate our findings, we present an
exactly solvable simplification of the EP estimation problem, whose loss
function landscape and stochastic properties give deeper intuition into the
robustness of the $\alpha$-NEEP.
( 2
min )
This paper presents a new type of hybrid model for Bayesian optimization (BO)
adept at managing mixed variables, encompassing both quantitative (continuous
and integer) and qualitative (categorical) types. Our proposed new hybrid
models (named hybridM) merge the Monte Carlo Tree Search structure (MCTS) for
categorical variables with Gaussian Processes (GP) for continuous ones. hybridM
leverages the upper confidence bound tree search (UCTS) for MCTS strategy,
showcasing the tree architecture's integration into Bayesian optimization. Our
innovations, including dynamic online kernel selection in the surrogate
modeling phase and a unique UCTS search strategy, position our hybrid models as
an advancement in mixed-variable surrogate models. Numerical experiments
underscore the superiority of hybrid models, highlighting their potential in
Bayesian optimization.
( 2
min )
Test log-likelihood is commonly used to compare different models of the same
data or different approximate inference algorithms for fitting the same
probabilistic model. We present simple examples demonstrating how comparisons
based on test log-likelihood can contradict comparisons according to other
objectives. Specifically, our examples show that (i) approximate Bayesian
inference algorithms that attain higher test log-likelihoods need not also
yield more accurate posterior approximations and (ii) conclusions about
forecast accuracy based on test log-likelihood comparisons may not agree with
conclusions based on root mean squared error.
( 2
min )
In this paper, we formulate the multi-agent graph bandit problem as a
multi-agent extension of the graph bandit problem introduced by Zhang,
Johansson, and Li [CISS 57, 1-6 (2023)]. In our formulation, $N$ cooperative
agents travel on a connected graph $G$ with $K$ nodes. Upon arrival at each
node, agents observe a random reward drawn from a node-dependent probability
distribution. The reward of the system is modeled as a weighted sum of the
rewards the agents observe, where the weights capture the decreasing marginal
reward associated with multiple agents sampling the same node at the same time.
We propose an Upper Confidence Bound (UCB)-based learning algorithm,
Multi-G-UCB, and prove that its expected regret over $T$ steps is bounded by
$O(N\log(T)[\sqrt{KT} + DK])$, where $D$ is the diameter of graph $G$. Lastly,
we numerically test our algorithm by comparing it to alternative methods.
( 2
min )
Seven years ago, an unexpected nationwide shortage of radiologists was triggered by a single statement from Professor Geoffrey Hinton. The statement was:“I think if you work as a radiologist, you are like the Wilie E Coyote in the cartoon. You are already over the edge of the cliff, but you have not looked down yet.… Read More »The AI radiologists replacement saga: Don’t be misled by the scaremongering – science v.s. science fiction
The post The AI radiologists replacement saga: Don’t be misled by the scaremongering – science v.s. science fiction appeared first on Data Science Central.
( 23
min )
In a technology of rapid digital transformation, leveraging records analytics and collaborative tools may be a sport changer. One such integration that is proving to be impactful is that of data analytics with Slack. This effective merger provides teams with the capability to engage and make selections based totally on actual-time insights, in the long… Read More »Unlocking team productivity: Integrating data analytics into your Slack workflow
The post Unlocking team productivity: Integrating data analytics into your Slack workflow appeared first on Data Science Central.
( 21
min )
Amazon Textract is a machine learning (ML) service that enables automatic extraction of text, handwriting, and data from scanned documents, surpassing traditional optical character recognition (OCR). It can identify, understand, and extract data from tables and forms with remarkable accuracy. Presently, several companies rely on manual extraction methods or basic OCR software, which is tedious […]
( 7
min )
Neural construction models have shown promising performance for Vehicle
Routing Problems (VRPs) by adopting either the Autoregressive (AR) or
Non-Autoregressive (NAR) learning approach. While AR models produce
high-quality solutions, they generally have a high inference latency due to
their sequential generation nature. Conversely, NAR models generate solutions
in parallel with a low inference latency but generally exhibit inferior
performance. In this paper, we propose a generic Guided Non-Autoregressive
Knowledge Distillation (GNARKD) method to obtain high-performance NAR models
having a low inference latency. GNARKD removes the constraint of sequential
generation in AR models while preserving the learned pivotal components in the
network architecture to obtain the corresponding NAR models through knowledge
distillation. We evaluate GNARKD by applying it to three widely adopted AR
models to obtain NAR VRP solvers for both synthesized and real-world instances.
The experimental results demonstrate that GNARKD significantly reduces the
inference time (4-5 times faster) with acceptable performance drop (2-3\%). To
the best of our knowledge, this study is first-of-its-kind to obtain NAR VRP
solvers from AR ones through knowledge distillation.
( 3
min )
Exponential families are statistical models which are the workhorses in
statistics, information theory, and machine learning among others. An
exponential family can either be normalized subtractively by its cumulant or
free energy function or equivalently normalized divisively by its partition
function. Both subtractive and divisive normalizers are strictly convex and
smooth functions inducing pairs of Bregman and Jensen divergences. It is
well-known that skewed Bhattacharryya distances between probability densities
of an exponential family amounts to skewed Jensen divergences induced by the
cumulant function between their corresponding natural parameters, and in limit
cases that the sided Kullback-Leibler divergences amount to reverse-sided
Bregman divergences. In this paper, we first show that the $\alpha$-divergences
between unnormalized densities of an exponential family amounts to scaled
$\alpha$-skewed Jensen divergences induced by the partition function. We then
show how comparative convexity with respect to a pair of quasi-arithmetic means
allows to deform both convex functions and their arguments, and thereby define
dually flat spaces with corresponding divergences when ordinary convexity is
preserved.
( 2
min )
This paper presents the computational challenge on topological deep learning
that was hosted within the ICML 2023 Workshop on Topology and Geometry in
Machine Learning. The competition asked participants to provide open-source
implementations of topological neural networks from the literature by
contributing to the python packages TopoNetX (data processing) and TopoModelX
(deep learning). The challenge attracted twenty-eight qualifying submissions in
its two-month duration. This paper describes the design of the challenge and
summarizes its main findings.
( 2
min )
As large language models (LLMs) like ChatGPT have gained traction, an
increasing number of news websites have begun utilizing them to generate
articles. However, not only can these language models produce factually
inaccurate articles on reputable websites but disreputable news sites can
utilize LLMs to mass produce misinformation. To begin to understand this
phenomenon, we present one of the first large-scale studies of the prevalence
of synthetic articles within online news media. To do this, we train a
DeBERTa-based synthetic news detector and classify over 15.90 million articles
from 3,074 misinformation and mainstream news websites. We find that between
January 1, 2022, and May 1, 2023, the relative number of synthetic news
articles increased by 55.4% on mainstream websites while increasing by 457% on
misinformation sites. We find that this increase is largely driven by smaller
less popular websites. Analyzing the impact of the release of ChatGPT using an
interrupted-time-series, we show that while its release resulted in a marked
increase in synthetic articles on small sites as well as misinformation news
websites, there was not a corresponding increase on large mainstream news
websites.
( 3
min )
2024 promises to be a breakout year for Generative AI (GenAI) and AI. However, there are two challenges that organizations will face in 2024 to “leverage AI to get value from their data.” Challenge #1: Too much focus is on “implementing AI” and not enough on gaining organizational alignment regarding where and how value will… Read More »GenAI: Beware the Productivity Trap; It’s About Cultural Empowerment – Part 3
The post GenAI: Beware the Productivity Trap; It’s About Cultural Empowerment – Part 3 appeared first on Data Science Central.
( 22
min )
AutoML platforms have numerous options for the algorithms to try for each
step of the analysis, i.e., different possible algorithms for imputation,
transformations, feature selection, and modelling. Finding the optimal
combination of algorithms and hyper-parameter values is computationally
expensive, as the number of combinations to explore leads to an exponential
explosion of the space. In this paper, we present the Sequential
Hyper-parameter Space Reduction (SHSR) algorithm that reduces the space for an
AutoML tool with negligible drop in its predictive performance. SHSR is a
meta-level learning algorithm that analyzes past runs of an AutoML tool on
several datasets and learns which hyper-parameter values to filter out from
consideration on a new dataset to analyze. SHSR is evaluated on 284
classification and 375 regression problems, showing an approximate 30%
reduction in execution time with a performance drop of less than 0.1%.
( 2
min )
Privacy-utility tradeoff remains as one of the fundamental issues of
differentially private machine learning. This paper introduces a geometrically
inspired kernel-based approach to mitigate the accuracy-loss issue in
classification. In this approach, a representation of the affine hull of given
data points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leads
to a novel distance measure that hides privacy-sensitive information about
individual data points and improves the privacy-utility tradeoff via
significantly reducing the risk of membership inference attacks. The
effectiveness of the approach is demonstrated through experiments on MNIST
dataset, Freiburg groceries dataset, and a real biomedical dataset. It is
verified that the approach remains computationally practical. The application
of the approach to federated learning is considered and it is observed that the
accuracy-loss due to data being distributed is either marginal or not
significantly high.
( 2
min )
Out-Of-Distribution (OOD) generalization is an essential topic in machine
learning. However, recent research is only focusing on the corresponding
methods for neural networks. This paper introduces a novel and effective
solution for OOD generalization of decision tree models, named Invariant
Decision Tree (IDT). IDT enforces a penalty term with regard to the
unstable/varying behavior of a split across different environments during the
growth of the tree. Its ensemble version, the Invariant Random Forest (IRF), is
constructed. Our proposed method is motivated by a theoretical result under
mild conditions, and validated by numerical tests with both synthetic and real
datasets. The superior performance compared to non-OOD tree models implies that
considering OOD generalization for tree models is absolutely necessary and
should be given more attention.
( 2
min )
We introduce a novel computational unit for neural networks that features
multiple biases, challenging the traditional perceptron structure. This unit
emphasizes the importance of preserving uncorrupted information as it is passed
from one unit to the next, applying activation functions later in the process
with specialized biases for each unit. Through both empirical and theoretical
analyses, we show that by focusing on increasing biases rather than weights,
there is potential for significant enhancement in a neural network model's
performance. This approach offers an alternative perspective on optimizing
information flow within neural networks. See source code at
https://github.com/CuriosAI/dac-dev.
( 2
min )
Bayesian Optimization (BO) is typically used to optimize an unknown function
$f$ that is noisy and costly to evaluate, by exploiting an acquisition function
that must be maximized at each optimization step. Even if provably
asymptotically optimal BO algorithms are efficient at optimizing
low-dimensional functions, scaling them to high-dimensional spaces remains an
open problem, often tackled by assuming an additive structure for $f$. By doing
so, BO algorithms typically introduce additional restrictive assumptions on the
additive structure that reduce their applicability domain. This paper contains
two main contributions: (i) we relax the restrictive assumptions on the
additive structure of $f$ without weakening the maximization guarantees of the
acquisition function, and (ii) we address the over-exploration problem for
decentralized BO algorithms. To these ends, we propose DuMBO, an asymptotically
optimal decentralized BO algorithm that achieves very competitive performance
against state-of-the-art BO algorithms, especially when the additive structure
of $f$ comprises high-dimensional factors.
( 2
min )
Real-time and accurate traffic flow prediction is the foundation for ensuring
the efficient operation of intelligent transportation systems.In existing
traffic flow prediction methods based on graph neural networks (GNNs),
pre-defined graphs were usually used to describe the spatial correlations of
different traffic nodes in urban road networks. However, the ability of
pre-defined graphs used to describe spatial correlation was limited by prior
knowledge and graph generation methods. Although time-varying graphs based on
data-driven learning can partially overcome the drawbacks of pre-defined
graphs, the learning ability of existing adaptive graphs was limited. For
example, time-varying graphs cannot adequately capture the inherent spatial
correlations in traffic flow data.In order to solve these problems, we have
proposed a hybrid time-varying graph neural network (HTVGNN) for traffic flow
prediction.
( 2
min )
Neural network wavefunctions optimized using the variational Monte Carlo
method have been shown to produce highly accurate results for the electronic
structure of atoms and small molecules, but the high cost of optimizing such
wavefunctions prevents their application to larger systems. We propose the
Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to
reduce this bottleneck. SPRING combines ideas from the recently introduced
minimum-step stochastic reconfiguration optimizer (MinSR) and the classical
randomized Kaczmarz method for solving linear least-squares problems. We
demonstrate that SPRING outperforms both MinSR and the popular
Kronecker-Factored Approximate Curvature method (KFAC) across a number of small
atoms and molecules, given that the learning rates of all methods are optimally
tuned. For example, on the oxygen atom, SPRING attains chemical accuracy after
forty thousand training iterations, whereas both MinSR and KFAC fail to do so
even after one hundred thousand iterations.
( 2
min )
Generating explanations for reinforcement learning (RL) is challenging as
actions may produce long-term effects on the future. In this paper, we develop
a novel framework for explainable RL by learning a causal world model without
prior knowledge of the causal structure of the environment. The model captures
the influence of actions, allowing us to interpret the long-term effects of
actions through causal chains, which present how actions influence
environmental variables and finally lead to rewards. Different from most
explanatory models which suffer from low accuracy, our model remains accurate
while improving explainability, making it applicable in model-based learning.
As a result, we demonstrate that our causal model can serve as the bridge
between explainability and learning.
( 2
min )
This paper examines some common problems in Human-Robot Interaction (HRI)
causing failures and troubles in Chat. A given use case's design decisions
start with the suitable robot, the suitable chatting model, identifying common
problems that cause failures, identifying potential solutions, and planning
continuous improvement. In conclusion, it is recommended to use a closed-loop
control algorithm that guides the use of trained Artificial Intelligence (AI)
pre-trained models and provides vocabulary filtering, re-train batched models
on new datasets, learn online from data streams, and/or use reinforcement
learning models to self-update the trained models and reduce errors.
( 2
min )
We propose an adjusted Wasserstein distributionally robust estimator -- based
on a nonlinear transformation of the Wasserstein distributionally robust (WDRO)
estimator in statistical learning. The classic WDRO estimator is asymptotically
biased, while our adjusted WDRO estimator is asymptotically unbiased, resulting
in a smaller asymptotic mean squared error. Meanwhile, the proposed adjusted
WDRO has an out-of-sample performance guarantee. Further, under certain
conditions, our proposed adjustment technique provides a general principle to
de-bias asymptotically biased estimators. Specifically, we will investigate how
the adjusted WDRO estimator is developed in the generalized linear model,
including logistic regression, linear regression, and Poisson regression.
Numerical experiments demonstrate the favorable practical performance of the
adjusted estimator over the classic one.
( 2
min )
Protein post-translational modification (PTM) site prediction is a
fundamental task in bioinformatics. Several computational methods have been
developed to predict PTM sites. However, existing methods ignore the structure
information and merely utilize protein sequences. Furthermore, designing a more
fine-grained structure representation learning method is urgently needed as PTM
is a biological event that occurs at the atom granularity. In this paper, we
propose a PTM site prediction method by Coupling of Multi-Granularity structure
and Multi-Scale sequence representation, PTM-CMGMS for brevity. Specifically,
multigranularity structure-aware representation learning is designed to learn
neighborhood structure representations at the amino acid, atom, and whole
protein granularity from AlphaFold predicted structures, followed by utilizing
contrastive learning to optimize the structure representations.Additionally,
multi-scale sequence representation learning is used to extract context
sequence information, and motif generated by aligning all context sequences of
PTM sites assists the prediction. Extensive experiments on three datasets show
that PTM-CMGMS outperforms the state-of-the-art methods.
( 2
min )
We propose a new algorithm for the problem of recovering data that adheres to
multiple, heterogeneous low-dimensional structures from linear observations.
Focusing on data matrices that are simultaneously row-sparse and low-rank, we
propose and analyze an iteratively reweighted least squares (IRLS) algorithm
that is able to leverage both structures. In particular, it optimizes a
combination of non-convex surrogates for row-sparsity and rank, a balancing of
which is built into the algorithm. We prove locally quadratic convergence of
the iterates to a simultaneously structured data matrix in a regime of minimal
sample complexity (up to constants and a logarithmic factor), which is known to
be impossible for a combination of convex surrogates. In experiments, we show
that the IRLS method exhibits favorable empirical convergence, identifying
simultaneously row-sparse and low-rank matrices from fewer measurements than
state-of-the-art methods. Code is available at
https://github.com/ckuemmerle/simirls.
( 2
min )
Collective motion is an ubiquitous phenomenon in nature, inspiring engineers,
physicists and mathematicians to develop mathematical models and bio-inspired
designs. Collective motion at small to medium group sizes ($\sim$10-1000
individuals, also called the `mesoscale'), can show nontrivial features due to
stochasticity. Therefore, characterizing both the deterministic and stochastic
aspects of the dynamics is crucial in the study of mesoscale collective
phenomena. Here, we use a physics-inspired, neural-network based approach to
characterize the stochastic group dynamics of interacting individuals, through
a stochastic differential equation (SDE) that governs the collective dynamics
of the group. We apply this technique on both synthetic and real-world
datasets, and identify the deterministic and stochastic aspects of the dynamics
using drift and diffusion fields, enabling us to make novel inferences about
the nature of order in these systems.
( 2
min )
In this work, we introduce ChatQA, a family of conversational question
answering (QA) models, that obtain GPT-4 level accuracies. Specifically, we
propose a two-stage instruction tuning method that can significantly improve
the zero-shot conversational QA results from large language models (LLMs). To
handle retrieval in conversational QA, we fine-tune a dense retriever on a
multi-turn QA dataset, which provides comparable results to using the
state-of-the-art query rewriting model while largely reducing deployment cost.
Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10
conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic
data from OpenAI GPT models.
( 2
min )
Atrial fibrillation (AF) is a common cardiac arrhythmia characterized by
rapid and irregular contractions of the atria. It significantly elevates the
risk of strokes due to slowed blood flow in the atria, especially in the left
atrial appendage, which is prone to blood clot formation. Such clots can
migrate into cerebral arteries, leading to ischemic stroke. To assess whether
AF patients should be prescribed anticoagulants, doctors often use the
CHA2DS2-VASc scoring system. However, anticoagulant use must be approached with
caution as it can impact clotting functions. This study introduces a machine
learning algorithm that predicts whether patients with AF should be recommended
anticoagulant therapy using 12-lead ECG data. In this model, we use STOME to
enhance time-series data and then process it through a Convolutional Neural
Network (CNN). By incorporating a path development layer, the model achieves a
specificity of 30.6% under the condition of an NPV of 1. In contrast, LSTM
algorithms without path development yield a specificity of only 2.7% under the
same NPV condition.
( 2
min )
Magnetic navigation (MagNav) is a rising alternative to the Global
Positioning System (GPS) and has proven useful for aircraft navigation.
Traditional aircraft navigation systems, while effective, face limitations in
precision and reliability in certain environments and against attacks. Airborne
MagNav leverages the Earth's magnetic field to provide accurate positional
information. However, external magnetic fields induced by aircraft electronics
and Earth's large-scale magnetic fields disrupt the weaker signal of interest.
We introduce a physics-informed approach using Tolles-Lawson coefficients for
compensation and Liquid Time-Constant Networks (LTCs) to remove complex, noisy
signals derived from the aircraft's magnetic sources. Using real flight data
with magnetometer measurements and aircraft measurements, we observe up to a
64% reduction in aeromagnetic compensation error (RMSE nT), outperforming
conventional models. This significant improvement underscores the potential of
a physics-informed, machine learning approach for extracting clean, reliable,
and accurate magnetic signals for MagNav positional estimation.
( 2
min )
Recommendation systems are highly interested in technology companies
nowadays. The businesses are constantly growing users and products, causing the
number of users and items to continuously increase over time, to very large
numbers. Traditional recommendation algorithms with complexity dependent on the
number of users and items make them difficult to adapt to the industrial
environment. In this paper, we introduce a new method applying graph neural
networks with a contrastive learning framework in extracting user preferences.
We incorporate a soft clustering architecture that significantly reduces the
computational cost of the inference process. Experiments show that the model is
able to learn user preferences with low computational cost in both training and
prediction phases. At the same time, the model gives a very good accuracy. We
call this architecture EfficientRec with the implication of model compactness
and the ability to scale to unlimited users and products.
( 2
min )
This work introduces a framework to address the computational complexity
inherent in Mixed-Integer Programming (MIP) models by harnessing the potential
of deep learning. We compare the effectiveness of (a) feed-forward neural
networks (ANN) and (b) convolutional neural networks (CNN) in approximating the
active dimensions within MIP problems. We utilize multi-label classification to
account for more than one active dimension. To enhance the framework's
performance, we employ Bayesian optimization for hyperparameter tuning, aiming
to maximize sample-level accuracy. The primary objective is to train the neural
networks to predict all active dimensions accurately, thereby maximizing the
occurrence of global optimum solutions. We apply this framework to a flow-based
facility location allocation Mixed-Integer Linear Programming (MILP)
formulation that describes long-term investment planning and medium-term
tactical planning in a personalized medicine supply chain for cell therapy
manufacturing and distribution.
( 2
min )
Cloud radiative feedback impacts early tropical cyclone (TC) intensification,
but limitations in existing diagnostic frameworks make them unsuitable for
studying asymmetric or transient radiative heating. We propose a linear
Variational Encoder-Decoder (VED) to learn the hidden relationship between
radiation and the surface intensification of realistic simulated TCs. Limiting
VED model inputs enables using its uncertainty to identify periods when
radiation has more importance for intensification. A close examination of the
extracted 3D radiative structures suggests that longwave radiative forcing from
inner core deep convection and shallow clouds both contribute to
intensification, with the deep convection having the most impact overall. We
find that deep convection downwind of the shallow clouds is critical to the
intensification of Haiyan. Our work demonstrates that machine learning can
discover thermodynamic-kinematic relationships without relying on axisymmetric
or deterministic assumptions, paving the way towards the objective discovery of
processes leading to TC intensification in realistic conditions.
( 2
min )
In this paper, we introduce eipy--an open-source Python package for
developing effective, multi-modal heterogeneous ensembles for classification.
eipy simultaneously provides both a rigorous, and user-friendly framework for
comparing and selecting the best-performing multi-modal data integration and
predictive modeling methods by systematically evaluating their performance
using nested cross-validation. The package is designed to leverage
scikit-learn-like estimators as components to build multi-modal predictive
models. An up-to-date user guide, including API reference and tutorials, for
eipy is maintained at https://eipy.readthedocs.io . The main repository for
this project can be found on GitHub at https://github.com/GauravPandeyLab/eipy .
( 2
min )
In this paper we propose a new non-linear classifier based on a combination
of locally linear classifiers. A well known optimization formulation is given
as we cast the problem in a $\ell_1$ Multiple Kernel Learning (MKL) problem
using many locally linear kernels. Since the number of such kernels is huge, we
provide a scalable generic MKL training algorithm handling streaming kernels.
With respect to the inference time, the resulting classifier fits the gap
between high accuracy but slow non-linear classifiers (such as classical MKL)
and fast but low accuracy linear classifiers.
( 2
min )
In the realm of robot action recognition, identifying distinct but spatially
proximate arm movements using vision systems in noisy environments poses a
significant challenge. This paper studies robot arm action recognition in noisy
environments using machine learning techniques. Specifically, a vision system
is used to track the robot's movements followed by a deep learning model to
extract the arm's key points. Through a comparative analysis of machine
learning methods, the effectiveness and robustness of this model are assessed
in noisy environments. A case study was conducted using the Tic-Tac-Toe game in
a 3-by-3 grid environment, where the focus is to accurately identify the
actions of the arms in selecting specific locations within this constrained
environment. Experimental results show that our approach can achieve precise
key point detection and action classification despite the addition of noise and
uncertainties to the dataset.
( 2
min )
The advent of large language models (LLMs) such as ChatGPT has attracted
considerable attention in various domains due to their remarkable performance
and versatility. As the use of these models continues to grow, the importance
of effective prompt engineering has come to the fore. Prompt optimization
emerges as a crucial challenge, as it has a direct impact on model performance
and the extraction of relevant information. Recently, evolutionary algorithms
(EAs) have shown promise in addressing this issue, paving the way for novel
optimization strategies. In this work, we propose a evolutionary
multi-objective (EMO) approach specifically tailored for prompt optimization
called EMO-Prompts, using sentiment analysis as a case study. We use sentiment
analysis capabilities as our experimental targets. Our results demonstrate that
EMO-Prompts effectively generates prompts capable of guiding the LLM to produce
texts embodying two conflicting emotions simultaneously.
( 2
min )
In the field of scientific computing, many problem-solving approaches tend to
focus only on the process and final outcome, even in AI for science, there is a
lack of deep multimodal information mining behind the data, missing a
multimodal framework akin to that in the image-text domain. In this paper, we
take Symbolic Regression(SR) as our focal point and, drawing inspiration from
the BLIP model in the image-text domain, propose a scientific computing
multimodal framework based on Function Images (Funcimg) and Operation Tree
Sequence (OTS), named Bootstrapping OTS-Funcimg Pre-training Model (Botfip). In
SR experiments, we validate the advantages of Botfip in low-complexity SR
problems, showcasing its potential. As a MED framework, Botfip holds promise
for future applications in a broader range of scientific computing problems.
( 2
min )
Hierarchical federated learning (HFL) enables distributed training of models
across multiple devices with the help of several edge servers and a cloud edge
server in a privacy-preserving manner. In this paper, we consider HFL with
highly mobile devices, mainly targeting at vehicular networks. Through
convergence analysis, we show that mobility influences the convergence speed by
both fusing the edge data and shuffling the edge models. While mobility is
usually considered as a challenge from the perspective of communication, we
prove that it increases the convergence speed of HFL with edge-level
heterogeneous data, since more diverse data can be incorporated. Furthermore,
we demonstrate that a higher speed leads to faster convergence, since it
accelerates the fusion of data. Simulation results show that mobility increases
the model accuracy of HFL by up to 15.1% when training a convolutional neural
network on the CIFAR-10 dataset.
( 2
min )
We demonstrate and evaluate a fully-blind digital signal processing (DSP)
chain for 100G passive optical networks (PONs), and analyze different equalizer
topologies based on neural networks with low hardware complexity.
( 2
min )
This paper presents VoxCeleb-ESP, a collection of pointers and timestamps to
YouTube videos facilitating the creation of a novel speaker recognition
dataset. VoxCeleb-ESP captures real-world scenarios, incorporating diverse
speaking styles, noises, and channel distortions. It includes 160 Spanish
celebrities spanning various categories, ensuring a representative distribution
across age groups and geographic regions in Spain. We provide two speaker trial
lists for speaker identification tasks, each of them with same-video or
different-video target trials respectively, accompanied by a cross-lingual
evaluation of ResNet pretrained models. Preliminary speaker identification
results suggest that the complexity of the detection task in VoxCeleb-ESP is
equivalent to that of the original and much larger VoxCeleb in English.
VoxCeleb-ESP contributes to the expansion of speaker recognition benchmarks
with a comprehensive and diverse dataset for the Spanish language.
( 2
min )
The quality of recorded videos and images is significantly influenced by the
camera's field of view (FOV). In critical applications like surveillance
systems and self-driving cars, an inadequate FOV can give rise to severe safety
and security concerns, including car accidents and thefts due to the failure to
detect individuals and objects. The conventional methods for establishing the
correct FOV heavily rely on human judgment and lack automated mechanisms to
assess video and image quality based on FOV. In this paper, we introduce an
innovative approach that harnesses semantic line detection and classification
alongside deep Hough transform to identify semantic lines, thus ensuring a
suitable FOV by understanding 3D view through parallel lines. Our approach
yields an effective F1 score of 0.729 on the public EgoCart dataset, coupled
with a notably high median score in the line placement metric. We illustrate
that our method offers a straightforward means of assessing the quality of the
camera's field of view, achieving a classification accuracy of 83.8\%. This
metric can serve as a proxy for evaluating the potential performance of video
and image quality applications.
( 2
min )
Advancements in machine learning (ML) have significantly revolutionized
medical image analysis, prompting hospitals to rely on external ML services.
However, the exchange of sensitive patient data, such as chest X-rays, poses
inherent privacy risks when shared with third parties. Addressing this concern,
we propose MedBlindTuner, a privacy-preserving framework leveraging fully
homomorphic encryption (FHE) and a data-efficient image transformer (DEiT).
MedBlindTuner enables the training of ML models exclusively on FHE-encrypted
medical images. Our experimental evaluation demonstrates that MedBlindTuner
achieves comparable accuracy to models trained on non-encrypted images,
offering a secure solution for outsourcing ML computations while preserving
patient data privacy. To the best of our knowledge, this is the first work that
uses data-efficient image transformers and fully homomorphic encryption in this
domain.
( 2
min )
Expensive ultrasonic anemometers are usually required to measure wind speed
accurately. The aim of this work is to overcome the loss of accuracy of a low
cost hot-wire anemometer caused by the changes of air temperature, by means of
a probabilistic calibration using Gaussian Process Regression. Gaussian Process
Regression is a non-parametric, Bayesian, and supervised learning method
designed to make predictions of an unknown target variable as a function of one
or more known input variables. Our approach is validated against real datasets,
obtaining a good performance in inferring the actual wind speed values. By
performing, before its real use in the field, a calibration of the hot-wire
anemometer taking into account air temperature, permits that the wind speed can
be estimated for the typical range of ambient temperatures, including a
grounded uncertainty estimation for each speed measure.
( 2
min )
The synthesis of string transformation programs from input-output examples
utilizes various techniques, all based on an inductive bias that comprises a
restricted set of basic operators to be combined. A new algorithm, Transduce,
is proposed, which is founded on the construction of abstract transduction
grammars and their generalization. We experimentally demonstrate that Transduce
can learn positional transformations efficiently from one or two positive
examples without inductive bias, achieving a success rate higher than the
current state of the art.
( 2
min )
Empirical studies have widely demonstrated that neural networks are highly
sensitive to small, adversarial perturbations of the input. The worst-case
robustness against these so-called adversarial examples can be quantified by
the Lipschitz constant of the neural network. In this paper, we study upper and
lower bounds for the Lipschitz constant of random ReLU neural networks.
Specifically, we assume that the weights and biases follow a generalization of
the He initialization, where general symmetric distributions for the biases are
permitted. For shallow neural networks, we characterize the Lipschitz constant
up to an absolute numerical constant. For deep networks with fixed depth and
sufficiently large width, our established upper bound is larger than the lower
bound by a factor that is logarithmic in the width.
( 2
min )
We consider transformer encoders with hard attention (in which all attention
is focused on exactly one position) and strict future masking (in which each
position only attends to positions strictly to its left), and prove that the
class of languages recognized by these networks is exactly the star-free
languages. Adding position embeddings increases the class of recognized
languages to other well-studied classes. A key technique in these proofs is
Boolean RASP, a variant of RASP that is restricted to Boolean values. Via the
star-free languages, we relate transformers to first-order logic, temporal
logic, and algebraic automata theory.
( 2
min )
Denoising diffusions are a powerful method to generate approximate samples
from high-dimensional data distributions. Recent results provide polynomial
bounds on their convergence rate, assuming $L^2$-accurate scores. Until now,
the tightest bounds were either superlinear in the data dimension or required
strong smoothness assumptions. We provide the first convergence bounds which
are linear in the data dimension (up to logarithmic factors) assuming only
finite second moments of the data distribution. We show that diffusion models
require at most $\tilde O(\frac{d \log^2(1/\delta)}{\varepsilon^2})$ steps to
approximate an arbitrary distribution on $\mathbb{R}^d$ corrupted with Gaussian
noise of variance $\delta$ to within $\varepsilon^2$ in KL divergence. Our
proof extends the Girsanov-based methods of previous works. We introduce a
refined treatment of the error from discretizing the reverse SDE inspired by
stochastic localization.
( 2
min )
Developing tools to automatically detect check-worthy claims in political
debates and speeches can greatly help moderators of debates, journalists, and
fact-checkers. While previous work on this problem has focused exclusively on
the text modality, here we explore the utility of the audio modality as an
additional input. We create a new multimodal dataset (text and audio in
English) containing 48 hours of speech from past political debates in the USA.
We then experimentally demonstrate that, in the case of multiple speakers,
adding the audio modality yields sizable improvements over using the text
modality alone; moreover, an audio-only model could outperform a text-only one
for a single speaker. With the aim to enable future research, we make all our
data and code publicly available at
https://github.com/petar-iv/audio-checkworthiness-detection.
( 2
min )
DNNs are widely used but face significant computational costs due to matrix
multiplications, especially from data movement between the memory and
processing units. One promising approach is therefore Processing-in-Memory as
it greatly reduces this overhead. However, most PIM solutions rely either on
novel memory technologies that have yet to mature or bit-serial computations
that have significant performance overhead and scalability issues. Our work
proposes an in-SRAM digital multiplier, that uses a conventional memory to
perform bit-parallel computations, leveraging multiple wordlines activation. We
then introduce DAISM, an architecture leveraging this multiplier, which
achieves up to two orders of magnitude higher area efficiency compared to the
SOTA counterparts, with competitive energy efficiency.
( 2
min )
As a classical generative modeling approach, energy-based models have the
natural advantage of flexibility in the form of the energy function. Recently,
energy-based models have achieved great success in modeling high-dimensional
data in computer vision and natural language processing. In line with these
advancements, we build a multi-purpose energy-based probabilistic model for
High Energy Physics events at the Large Hadron Collider. This framework builds
on a powerful generative model and describes higher-order inter-particle
interactions. It suits different encoding architectures and builds on implicit
generation. As for applicative aspects, it can serve as a powerful
parameterized event generator for physics simulation, a generic anomalous
signal detector free from spurious correlations, and an augmented event
classifier for particle identification.
( 2
min )
We introduce a novel procedure for obtaining cross-validated predictive
estimates for Bayesian hierarchical regression models (BHRMs). Bayesian
hierarchical models are popular for their ability to model complex dependence
structures and provide probabilistic uncertainty estimates, but can be
computationally expensive to run. Cross-validation (CV) is therefore not a
common practice to evaluate the predictive performance of BHRMs. Our method
circumvents the need to re-run computationally costly estimation methods for
each cross-validation fold and makes CV more feasible for large BHRMs. By
conditioning on the variance-covariance parameters, we shift the CV problem
from probability-based sampling to a simple and familiar optimization problem.
In many cases, this produces estimates which are equivalent to full CV. We
provide theoretical results and demonstrate its efficacy on publicly available
data and in simulations.
( 2
min )
Federated learning are inherently hampered by data heterogeneity: non-iid
distributed training data over local clients. We propose a novel model training
approach for federated learning, FLex&Chill, which exploits the Logit Chilling
method. Through extensive evaluations, we demonstrate that, in the presence of
non-iid data characteristics inherent in federated learning systems, this
approach can expedite model convergence and improve inference accuracy.
Quantitatively, from our experiments, we observe up to 6X improvement in the
global federated learning model convergence time, and up to 3.37% improvement
in inference accuracy.
( 2
min )
Graph Neural Networks (GNNs) have become the preferred tool to process graph
data, with their efficacy being boosted through graph data augmentation
techniques. Despite the evolution of augmentation methods, issues like graph
property distortions and restricted structural changes persist. This leads to
the question: Is it possible to develop more property-conserving and
structure-sensitive augmentation methods? Through a spectral lens, we
investigate the interplay between graph properties, their augmentation, and
their spectral behavior, and found that keeping the low-frequency eigenvalues
unchanged can preserve the critical properties at a large scale when generating
augmented graphs. These observations inform our introduction of the Dual-Prism
(DP) augmentation method, comprising DP-Noise and DP-Mask, which adeptly
retains essential graph properties while diversifying augmented graphs.
Extensive experiments validate the efficiency of our approach, providing a new
and promising direction for graph data augmentation.
( 2
min )
Graph Neural Networks (GNNs) have shown considerable effectiveness in a
variety of graph learning tasks, particularly those based on the
message-passing approach in recent years. However, their performance is often
constrained by a limited receptive field, a challenge that becomes more acute
in the presence of sparse graphs. In light of the power series, which possesses
infinite expansion capabilities, we propose a novel \underline{G}raph
\underline{P}ower \underline{F}ilter \underline{N}eural Network (GPFN) that
enhances node classification by employing a power series graph filter to
augment the receptive field. Concretely, our GPFN designs a new way to build a
graph filter with an infinite receptive field based on the convergence power
series, which can be analyzed in the spectral and spatial domains. Besides, we
theoretically prove that our GPFN is a general framework that can integrate any
power series and capture long-range dependencies. Finally, experimental results
on three datasets demonstrate the superiority of our GPFN over state-of-the-art
baselines.
( 2
min )
Digital-analog quantum computing (DAQC) is an alternative paradigm for
universal quantum computation combining digital single-qubit gates with global
analog operations acting on a register of interacting qubits. Currently, no
available open-source software is tailored to express, differentiate, and
execute programs within the DAQC paradigm. In this work, we address this
shortfall by presenting Qadence, a high-level programming interface for
building complex digital-analog quantum programs developed at Pasqal. Thanks to
its flexible interface, native differentiability, and focus on real-device
execution, Qadence aims at advancing research on variational quantum algorithms
built for native DAQC platforms such as Rydberg atom arrays.
( 2
min )
We propose an adjusted Wasserstein distributionally robust estimator -- based
on a nonlinear transformation of the Wasserstein distributionally robust (WDRO)
estimator in statistical learning. The classic WDRO estimator is asymptotically
biased, while our adjusted WDRO estimator is asymptotically unbiased, resulting
in a smaller asymptotic mean squared error. Meanwhile, the proposed adjusted
WDRO has an out-of-sample performance guarantee. Further, under certain
conditions, our proposed adjustment technique provides a general principle to
de-bias asymptotically biased estimators. Specifically, we will investigate how
the adjusted WDRO estimator is developed in the generalized linear model,
including logistic regression, linear regression, and Poisson regression.
Numerical experiments demonstrate the favorable practical performance of the
adjusted estimator over the classic one.
( 2
min )
We introduce a novel procedure for obtaining cross-validated predictive
estimates for Bayesian hierarchical regression models (BHRMs). Bayesian
hierarchical models are popular for their ability to model complex dependence
structures and provide probabilistic uncertainty estimates, but can be
computationally expensive to run. Cross-validation (CV) is therefore not a
common practice to evaluate the predictive performance of BHRMs. Our method
circumvents the need to re-run computationally costly estimation methods for
each cross-validation fold and makes CV more feasible for large BHRMs. By
conditioning on the variance-covariance parameters, we shift the CV problem
from probability-based sampling to a simple and familiar optimization problem.
In many cases, this produces estimates which are equivalent to full CV. We
provide theoretical results and demonstrate its efficacy on publicly available
data and in simulations.
( 2
min )
As a classical generative modeling approach, energy-based models have the
natural advantage of flexibility in the form of the energy function. Recently,
energy-based models have achieved great success in modeling high-dimensional
data in computer vision and natural language processing. In line with these
advancements, we build a multi-purpose energy-based probabilistic model for
High Energy Physics events at the Large Hadron Collider. This framework builds
on a powerful generative model and describes higher-order inter-particle
interactions. It suits different encoding architectures and builds on implicit
generation. As for applicative aspects, it can serve as a powerful
parameterized event generator for physics simulation, a generic anomalous
signal detector free from spurious correlations, and an augmented event
classifier for particle identification.
( 2
min )
Empirical studies have widely demonstrated that neural networks are highly
sensitive to small, adversarial perturbations of the input. The worst-case
robustness against these so-called adversarial examples can be quantified by
the Lipschitz constant of the neural network. In this paper, we study upper and
lower bounds for the Lipschitz constant of random ReLU neural networks.
Specifically, we assume that the weights and biases follow a generalization of
the He initialization, where general symmetric distributions for the biases are
permitted. For shallow neural networks, we characterize the Lipschitz constant
up to an absolute numerical constant. For deep networks with fixed depth and
sufficiently large width, our established upper bound is larger than the lower
bound by a factor that is logarithmic in the width.
( 2
min )
Denoising diffusions are a powerful method to generate approximate samples
from high-dimensional data distributions. Recent results provide polynomial
bounds on their convergence rate, assuming $L^2$-accurate scores. Until now,
the tightest bounds were either superlinear in the data dimension or required
strong smoothness assumptions. We provide the first convergence bounds which
are linear in the data dimension (up to logarithmic factors) assuming only
finite second moments of the data distribution. We show that diffusion models
require at most $\tilde O(\frac{d \log^2(1/\delta)}{\varepsilon^2})$ steps to
approximate an arbitrary distribution on $\mathbb{R}^d$ corrupted with Gaussian
noise of variance $\delta$ to within $\varepsilon^2$ in KL divergence. Our
proof extends the Girsanov-based methods of previous works. We introduce a
refined treatment of the error from discretizing the reverse SDE inspired by
stochastic localization.
( 2
min )
In this paper we propose a new non-linear classifier based on a combination
of locally linear classifiers. A well known optimization formulation is given
as we cast the problem in a $\ell_1$ Multiple Kernel Learning (MKL) problem
using many locally linear kernels. Since the number of such kernels is huge, we
provide a scalable generic MKL training algorithm handling streaming kernels.
With respect to the inference time, the resulting classifier fits the gap
between high accuracy but slow non-linear classifiers (such as classical MKL)
and fast but low accuracy linear classifiers.
( 2
min )
Expensive ultrasonic anemometers are usually required to measure wind speed
accurately. The aim of this work is to overcome the loss of accuracy of a low
cost hot-wire anemometer caused by the changes of air temperature, by means of
a probabilistic calibration using Gaussian Process Regression. Gaussian Process
Regression is a non-parametric, Bayesian, and supervised learning method
designed to make predictions of an unknown target variable as a function of one
or more known input variables. Our approach is validated against real datasets,
obtaining a good performance in inferring the actual wind speed values. By
performing, before its real use in the field, a calibration of the hot-wire
anemometer taking into account air temperature, permits that the wind speed can
be estimated for the typical range of ambient temperatures, including a
grounded uncertainty estimation for each speed measure.
( 2
min )
In this paper, we discuss a potential agenda for future work in the theory of
random sets and belief functions, touching upon a number of focal issues: the
development of a fully-fledged theory of statistical reasoning with random
sets, including the generalisation of logistic regression and of the classical
laws of probability; the further development of the geometric approach to
uncertainty, to include general random sets, a wider range of uncertainty
measures and alternative geometric representations; the application of this new
theory to high-impact areas such as climate change, machine learning and
statistical learning theory.
( 2
min )
In this post, we demonstrate how to use neural architecture search (NAS) based structural pruning to compress a fine-tuned BERT model to improve model performance and reduce inference times. Pre-trained language models (PLMs) are undergoing rapid commercial and enterprise adoption in the areas of productivity tools, customer service, search and recommendations, business process automation, and […]
( 15
min )
Despite the success of deep learning-based algorithms, it is widely known
that neural networks may fail to be robust. A popular paradigm to enforce
robustness is adversarial training (AT), however, this introduces many
computational and theoretical difficulties. Recent works have developed a
connection between AT in the multiclass classification setting and
multimarginal optimal transport (MOT), unlocking a new set of tools to study
this problem. In this paper, we leverage the MOT connection to propose
computationally tractable numerical algorithms for computing universal lower
bounds on the optimal adversarial risk and identifying optimal classifiers. We
propose two main algorithms based on linear programming (LP) and entropic
regularization (Sinkhorn). Our key insight is that one can harmlessly truncate
the higher order interactions between classes, preventing the combinatorial run
times typically encountered in MOT problems. We validate these results with
experiments on MNIST and CIFAR-$10$, which demonstrate the tractability of our
approach.
( 2
min )
Online reviews in the form of user-generated content (UGC) significantly
impact consumer decision-making. However, the pervasive issue of not only human
fake content but also machine-generated content challenges UGC's reliability.
Recent advances in Large Language Models (LLMs) may pave the way to fabricate
indistinguishable fake generated content at a much lower cost. Leveraging
OpenAI's GPT-4-Turbo and DALL-E-2 models, we craft AiGen-FoodReview, a
multi-modal dataset of 20,144 restaurant review-image pairs divided into
authentic and machine-generated. We explore unimodal and multimodal detection
models, achieving 99.80% multimodal accuracy with FLAVA. We use attributes from
readability and photographic theories to score reviews and images,
respectively, demonstrating their utility as hand-crafted features in scalable
and interpretable detection models, with comparable performance. The paper
contributes by open-sourcing the dataset and releasing fake review detectors,
recommending its use in unimodal and multimodal fake review detection tasks,
and evaluating linguistic and visual features in synthetic versus authentic
data.
( 2
min )
This paper introduces the Expected Booking (xB) model, a novel metric
designed to estimate the likelihood of a foul resulting in a yellow card in
football. Through three iterative experiments, employing ensemble methods, the
model demonstrates improved performance with additional features and an
expanded dataset. Analysis of FIFA World Cup 2022 data validates the model's
efficacy in providing insights into team and player fouling tactics, aligning
with actual defensive performance. The xB model addresses a gap in fouling
efficiency examination, emphasizing defensive strategies which often
overlooked. Further enhancements are suggested through the incorporation of
comprehensive data and spatial features.
( 2
min )
This paper discusses the limitations of machine learning (ML), particularly
deep artificial neural networks (ANNs), which are effective at approximating
complex functions but often lack transparency and explanatory power. It
highlights the `problem of induction' : the philosophical issue that past
observations may not necessarily predict future events, a challenge that ML
models face when encountering new, unseen data. The paper argues for the
importance of not just making predictions but also providing good explanations,
a feature that current models often fail to deliver. It suggests that for AI to
progress, we must seek models that offer insights and explanations, not just
predictions.
( 2
min )
This paper proposes two methods for causal additive models with unobserved
variables (CAM-UV). CAM-UV assumes that the causal functions take the form of
generalized additive models and that latent confounders are present. First, we
propose a method that leverages prior knowledge for efficient causal discovery.
Then, we propose an extension of this method for inferring causality in time
series data. The original CAM-UV algorithm differs from other existing causal
function models in that it does not seek the causal order between observed
variables, but rather aims to identify the causes for each observed variable.
Therefore, the first proposed method in this paper utilizes prior knowledge,
such as understanding that certain variables cannot be causes of specific
others. Moreover, by incorporating the prior knowledge that causes precedes
their effects in time, we extend the first algorithm to the second method for
causal discovery in time series data. We validate the first proposed method by
using simulated data to demonstrate that the accuracy of causal discovery
increases as more prior knowledge is accumulated. Additionally, we test the
second proposed method by comparing it with existing time series causal
discovery methods, using both simulated data and real-world data.
( 3
min )
Adversarial Attacks on Face Recognition (FR) encompass two types:
impersonation attacks and evasion attacks. We observe that achieving a
successful impersonation attack on FR does not necessarily ensure a successful
dodging attack on FR in the black-box setting. Introducing a novel attack
method named Pre-training Pruning Restoration Attack (PPR), we aim to enhance
the performance of dodging attacks whilst avoiding the degradation of
impersonation attacks. Our method employs adversarial example pruning, enabling
a portion of adversarial perturbations to be set to zero, while tending to
maintain the attack performance. By utilizing adversarial example pruning, we
can prune the pre-trained adversarial examples and selectively free up certain
adversarial perturbations. Thereafter, we embed adversarial perturbations in
the pruned area, which enhances the dodging performance of the adversarial face
examples. The effectiveness of our proposed attack method is demonstrated
through our experimental results, showcasing its superior performance.
( 2
min )
With growing concerns surrounding privacy and regulatory compliance, the
concept of machine unlearning has gained prominence, aiming to selectively
forget or erase specific learned information from a trained model. In response
to this critical need, we introduce a novel approach called Attack-and-Reset
for Unlearning (ARU). This algorithm leverages meticulously crafted adversarial
noise to generate a parameter mask, effectively resetting certain parameters
and rendering them unlearnable. ARU outperforms current state-of-the-art
results on two facial machine-unlearning benchmark datasets, MUFAC and MUCAC.
In particular, we present the steps involved in attacking and masking that
strategically filter and re-initialize network parameters biased towards the
forget set. Our work represents a significant advancement in rendering data
unexploitable to deep learning models through parameter re-initialization,
achieved by harnessing adversarial noise to craft a mask.
( 2
min )
The goal of real-time lyrics alignment is to take live singing audio as input
and to pinpoint the exact position within given lyrics on the fly. The task can
benefit real-world applications such as the automatic subtitling of live
concerts or operas. However, designing a real-time model poses a great
challenge due to the constraints of only using past input and operating within
a minimal latency. Furthermore, due to the lack of datasets for real-time
models for lyrics alignment, previous studies have mostly evaluated with
private in-house datasets, resulting in a lack of standard evaluation methods.
This paper presents a real-time lyrics alignment system for classical vocal
performances with two contributions. First, we improve the lyrics alignment
algorithm by finding an optimal combination of chromagram and phonetic
posteriorgram (PPG) that capture melodic and phonetics features of the singing
voice, respectively. Second, we recast the Schubert Winterreise Dataset (SWD)
which contains multiple performance renditions of the same pieces as an
evaluation set for the real-time lyrics alignment.
( 2
min )
Graph neural networks are increasingly becoming the framework of choice for
graph-based machine learning. In this paper, we propose a new graph neural
network architecture that substitutes classical message passing with an
analysis of the local distribution of node features. To this end, we extract
the distribution of features in the egonet for each local neighbourhood and
compare them against a set of learned label distributions by taking the
histogram intersection kernel. The similarity information is then propagated to
other nodes in the network, effectively creating a message passing-like
mechanism where the message is determined by the ensemble of the features. We
perform an ablation study to evaluate the network's performance under different
choices of its hyper-parameters. Finally, we test our model on standard graph
classification and regression benchmarks, and we find that it outperforms
widely used alternative approaches, including both graph kernels and graph
neural networks.
( 2
min )
We introduce a novel capacity measure 2sED for statistical models based on
the effective dimension. The new quantity provably bounds the generalization
error under mild assumptions on the model. Furthermore, simulations on standard
data sets and popular model architectures show that 2sED correlates well with
the training error. For Markovian models, we show how to efficiently
approximate 2sED from below through a layerwise iterative approach, which
allows us to tackle deep learning models with a large number of parameters.
Simulation results suggest that the approximation is good for different
prominent models and data sets.
( 2
min )
Due to the complex behavior arising from non-uniqueness, symmetry, and
bifurcations in the solution space, solving inverse problems of nonlinear
differential equations (DEs) with multiple solutions is a challenging task. To
address this, we propose homotopy physics-informed neural networks (HomPINNs),
a novel framework that leverages homotopy continuation and neural networks
(NNs) to solve inverse problems. The proposed framework begins with the use of
NNs to simultaneously approximate unlabeled observations across diverse
solutions while adhering to DE constraints. Through homotopy continuation, the
proposed method solves the inverse problem by tracing the observations and
identifying multiple solutions. The experiments involve testing the performance
of the proposed method on one-dimensional DEs and applying it to solve a
two-dimensional Gray-Scott simulation. Our findings demonstrate that the
proposed method is scalable and adaptable, providing an effective solution for
solving DEs with multiple solutions and unknown parameters. Moreover, it has
significant potential for various applications in scientific computing, such as
modeling complex systems and solving inverse problems in physics, chemistry,
biology, etc.
( 3
min )
Semantic similarity between natural language texts is typically measured
either by looking at the overlap between subsequences (e.g., BLEU) or by using
embeddings (e.g., BERTScore, S-BERT). Within this paper, we argue that when we
are only interested in measuring the semantic similarity, it is better to
directly predict the similarity using a fine-tuned model for such a task. Using
a fine-tuned model for the Semantic Textual Similarity Benchmark tasks (STS-B)
from the GLUE benchmark, we define the STSScore approach and show that the
resulting similarity is better aligned with our expectations on a robust
semantic similarity measure than other approaches.
( 2
min )
Anomaly, or out-of-distribution, detection is a promising tool for aiding
discoveries of new particles or processes in particle physics. In this work, we
identify and address two overlooked opportunities to improve anomaly detection
for high-energy physics. First, rather than train a generative model on the
single most dominant background process, we build detection algorithms using
representation learning from multiple background types, thus taking advantage
of more information to improve estimation of what is relevant for detection.
Second, we generalize decorrelation to the multi-background setting, thus
directly enforcing a more complete definition of robustness for anomaly
detection. We demonstrate the benefit of the proposed robust multi-background
anomaly detection algorithms on a high-dimensional dataset of particle decays
at the Large Hadron Collider.
( 2
min )
Unsupervised Multiple Domain Translation is the task of transforming data
from one domain to other domains without having paired data to train the
systems. Typically, methods based on Generative Adversarial Networks (GANs) are
used to address this task. However, our proposal exclusively relies on a
modified version of a Variational Autoencoder. This modification consists of
the use of two latent variables disentangled in a controlled way by design. One
of this latent variables is imposed to depend exclusively on the domain, while
the other one must depend on the rest of the variability factors of the data.
Additionally, the conditions imposed over the domain latent variable allow for
better control and understanding of the latent space. We empirically
demonstrate that our approach works on different vision datasets improving the
performance of other well known methods. Finally, we prove that, indeed, one of
the latent variables stores all the information related to the domain and the
other one hardly contains any domain information.
( 2
min )
Audio embeddings are crucial tools in understanding large catalogs of music.
Typically embeddings are evaluated on the basis of the performance they provide
in a wide range of downstream tasks, however few studies have investigated the
local properties of the embedding spaces themselves which are important in
nearest neighbor algorithms, commonly used in music search and recommendation.
In this work we show that when learning audio representations on music datasets
via contrastive learning, musical properties that are typically homogeneous
within a track (e.g., key and tempo) are reflected in the locality of
neighborhoods in the resulting embedding space. By applying appropriate data
augmentation strategies, localisation of such properties can not only be
reduced but the localisation of other attributes is increased. For example,
locality of features such as pitch and tempo that are less relevant to
non-expert listeners, may be mitigated while improving the locality of more
salient features such as genre and mood, achieving state-of-the-art performance
in nearest neighbor retrieval accuracy. Similarly, we show that the optimal
selection of data augmentation strategies for contrastive learning of music
audio embeddings is dependent on the downstream task, highlighting this as an
important embedding design decision.
( 3
min )
End-to-end learning has emerged as a major paradigm for developing autonomous
systems. Unfortunately, with its performance and convenience comes an even
greater challenge of safety assurance. A key factor of this challenge is the
absence of the notion of a low-dimensional and interpretable dynamical state,
around which traditional assurance methods revolve. Focusing on the online
safety prediction problem, this paper proposes a configurable family of
learning pipelines based on generative world models, which do not require
low-dimensional states. To implement these pipelines, we overcome the
challenges of learning safety-informed latent representations and missing
safety labels under prediction-induced distribution shift. These pipelines come
with statistical calibration guarantees on their safety chance predictions
based on conformal prediction. We perform an extensive evaluation of the
proposed learning pipelines on two case studies of image-controlled systems: a
racing car and a cartpole.
( 2
min )
Vertical Federated Learning (VFL) is a crucial paradigm for training machine
learning models on feature-partitioned, distributed data. However, due to
privacy restrictions, few public real-world VFL datasets exist for algorithm
evaluation, and these represent a limited array of feature distributions.
Existing benchmarks often resort to synthetic datasets, derived from arbitrary
feature splits from a global set, which only capture a subset of feature
distributions, leading to inadequate algorithm performance assessment. This
paper addresses these shortcomings by introducing two key factors affecting VFL
performance - feature importance and feature correlation - and proposing
associated evaluation metrics and dataset splitting methods. Additionally, we
introduce a real VFL dataset to address the deficit in image-image VFL
scenarios. Our comprehensive evaluation of cutting-edge VFL algorithms provides
valuable insights for future research in the field.
( 2
min )
We study the classical Network Revenue Management (NRM) problem with
accept/reject decisions and $T$ IID arrivals. We consider a distributional form
where each arrival must fall under a finite number of possible categories, each
with a deterministic resource consumption vector, but a random value
distributed continuously over an interval. We develop an online algorithm that
achieves $O(\log^2 T)$ regret under this model, with the only (necessary)
assumption being that the probability densities are bounded away from 0. We
derive a second result that achieves $O(\log T)$ regret under an additional
assumption of second-order growth. To our knowledge, these are the first
results achieving logarithmic-level regret in an NRM model with continuous
values that do not require any kind of ``non-degeneracy'' assumptions. Our
results are achieved via new techniques including a new method of bounding
myopic regret, a ``semi-fluid'' relaxation of the offline allocation, and an
improved bound on the ``dual convergence''.
( 2
min )
Deep learning techniques, despite their potential, often suffer from a lack
of reproducibility and generalizability, impeding their clinical adoption.
Image segmentation is one of the critical tasks in medical image analysis, in
which one or several regions/volumes of interest should be annotated. This
paper introduces the RIDGE checklist, a framework for assessing the
Reproducibility, Integrity, Dependability, Generalizability, and Efficiency of
deep learning-based medical image segmentation models. The checklist serves as
a guide for researchers to enhance the quality and transparency of their work,
ensuring that segmentation models are not only scientifically sound but also
clinically relevant.
( 2
min )
Continual learning, the ability of a model to learn over time without
forgetting previous knowledge and, therefore, be adaptive to new data, is
paramount in dynamic fields such as disease outbreak prediction. Deep neural
networks, i.e., LSTM, are prone to error due to catastrophic forgetting. This
study introduces a novel CEL model for continual learning by leveraging domain
adaptation via Elastic Weight Consolidation (EWC). This model aims to mitigate
the catastrophic forgetting phenomenon in a domain incremental setting. The
Fisher Information Matrix (FIM) is constructed with EWC to develop a
regularization term that penalizes changes to important parameters, namely, the
important previous knowledge. CEL's performance is evaluated on three distinct
diseases, Influenza, Mpox, and Measles, with different metrics. The high
R-squared values during evaluation and reevaluation outperform the other
state-of-the-art models in several contexts, indicating that CEL adapts to
incremental data well. CEL's robustness and reliability are underscored by its
minimal 65% forgetting rate and 18% higher memory stability compared to
existing benchmark studies. This study highlights CEL's versatility in disease
outbreak prediction, addressing evolving data with temporal patterns. It offers
a valuable model for proactive disease control with accurate, timely
predictions.
( 2
min )
We present Scalable Interpolant Transformers (SiT), a family of generative
models built on the backbone of Diffusion Transformers (DiT). The interpolant
framework, which allows for connecting two distributions in a more flexible way
than standard diffusion models, makes possible a modular study of various
design choices impacting generative models built on dynamical transport: using
discrete vs. continuous time learning, deciding the objective for the model to
learn, choosing the interpolant connecting the distributions, and deploying a
deterministic or stochastic sampler. By carefully introducing the above
ingredients, SiT surpasses DiT uniformly across model sizes on the conditional
ImageNet 256x256 benchmark using the exact same backbone, number of parameters,
and GFLOPs. By exploring various diffusion coefficients, which can be tuned
separately from learning, SiT achieves an FID-50K score of 2.06.
( 2
min )
Using neural networks for localization of key fob within and surrounding a
car as a security feature for keyless entry is fast emerging. In this paper we
study: 1) the performance of pre-computed features of neural networks based UWB
(ultra wide band) localization classification forming the baseline of our
experiments. 2) Investigate the inherent robustness of various neural networks;
therefore, we include the study of robustness of the adversarial examples
without any adversarial training in this work. 3) Propose a multi-head
self-supervised neural network architecture which outperforms the baseline
neural networks without any adversarial training. The model's performance
improved by 67% at certain ranges of adversarial magnitude for fast gradient
sign method and 37% each for basic iterative method and projected gradient
descent method.
( 2
min )
We present a novel method for anomaly detection in Solar System object data,
in preparation for the Legacy Survey of Space and Time. We train a deep
autoencoder for anomaly detection and use the learned latent space to search
for other interesting objects. We demonstrate the efficacy of the autoencoder
approach by finding interesting examples, such as interstellar objects, and
show that using the autoencoder, further examples of interesting classes can be
found. We also investigate the limits of classic unsupervised approaches to
anomaly detection through the generation of synthetic anomalies and evaluate
the feasibility of using a supervised learning approach. Future work should
consider expanding the feature space to increase the variety of anomalies that
can be uncovered during the survey using an autoencoder.
( 2
min )
In this paper, for the first time, a method is presented that can provide a
fully automated surgery based on software and computer vision techniques. Then,
the advantages and challenges of computerization of medical surgery are
examined. Finally, the surgery related to isolated ovarian endometriosis
disease has been examined, and based on the presented method, a more detailed
algorithm is presented that is capable of automatically diagnosing and treating
this disease during surgery as proof of our proposed method where a U-net is
trained to detect the endometriosis during surgery.
( 2
min )
In this paper, we introduce DiarizationLM, a framework to leverage large
language models (LLM) to post-process the outputs from a speaker diarization
system. Various goals can be achieved with the proposed framework, such as
improving the readability of the diarized transcript, or reducing the word
diarization error rate (WDER). In this framework, the outputs of the automatic
speech recognition (ASR) and speaker diarization systems are represented as a
compact textual format, which is included in the prompt to an optionally
finetuned LLM. The outputs of the LLM can be used as the refined diarization
results with the desired enhancement. As a post-processing step, this framework
can be easily applied to any off-the-shelf ASR and speaker diarization systems
without retraining existing components. Our experiments show that a finetuned
PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone
conversation dataset, and rel. 44.9% on the Callhome English dataset.
( 2
min )
Inspired by human conscious planning, we propose Skipper, a model-based
reinforcement learning agent utilizing spatio-temporal abstractions to
generalize learned skills in novel situations. It automatically decomposes the
given task into smaller, more manageable subtasks, and hence enables sparse
decision-making and focused computation on the relevant parts of the
environment. This relies on the extraction of an abstracted proxy problem
represented as a directed graph, in which vertices and edges are learned
end-to-end from hindsight. Our theoretical analyses provide performance
guarantees under appropriate assumptions and establish where our approach is
expected to be helpful. Generalization-focused experiments validate Skipper's
significant advantage in zero-shot generalization, compared to existing
state-of-the-art hierarchical planning methods.
( 2
min )
Mechanistic interpretability seeks to understand the internal mechanisms of
machine learning models, where localization -- identifying the important model
components -- is a key step. Activation patching, also known as causal tracing
or interchange intervention, is a standard technique for this task (Vig et al.,
2020), but the literature contains many variants with little consensus on the
choice of hyperparameters or methodology. In this work, we systematically
examine the impact of methodological details in activation patching, including
evaluation metrics and corruption methods. In several settings of localization
and circuit discovery in language models, we find that varying these
hyperparameters could lead to disparate interpretability results. Backed by
empirical observations, we give conceptual arguments for why certain metrics or
methods may be preferred. Finally, we provide recommendations for the best
practices of activation patching going forwards.
( 2
min )
We present a new representation learning framework, Intensity Profile
Projection, for continuous-time dynamic network data. Given triples $(i,j,t)$,
each representing a time-stamped ($t$) interaction between two entities
($i,j$), our procedure returns a continuous-time trajectory for each node,
representing its behaviour over time. The framework consists of three stages:
estimating pairwise intensity functions, e.g. via kernel smoothing; learning a
projection which minimises a notion of intensity reconstruction error; and
constructing evolving node representations via the learned projection. The
trajectories satisfy two properties, known as structural and temporal
coherence, which we see as fundamental for reliable inference. Moreoever, we
develop estimation theory providing tight control on the error of any estimated
trajectory, indicating that the representations could even be used in quite
noise-sensitive follow-on analyses. The theory also elucidates the role of
smoothing as a bias-variance trade-off, and shows how we can reduce the level
of smoothing as the signal-to-noise ratio increases on account of the algorithm
`borrowing strength' across the network.
( 2
min )
Homeostasis is a biological process by which living beings maintain their
internal balance. Previous research suggests that homeostasis is a learned
behaviour. Recently introduced Homeostatic Regulated Reinforcement Learning
(HRRL) framework attempts to explain this learned homeostatic behavior by
linking Drive Reduction Theory and Reinforcement Learning. This linkage has
been proven in the discrete time-space, but not in the continuous time-space.
In this work, we advance the HRRL framework to a continuous time-space
environment and validate the CTCS-HRRL (Continuous Time Continuous Space HRRL)
framework. We achieve this by designing a model that mimics the homeostatic
mechanisms in a real-world biological agent. This model uses the
Hamilton-Jacobian Bellman Equation, and function approximation based on neural
networks and Reinforcement Learning. Through a simulation-based experiment we
demonstrate the efficacy of this model and uncover the evidence linked to the
agent's ability to dynamically choose policies that favor homeostasis in a
continuously changing internal-state milieu. Results of our experiments
demonstrate that agent learns homeostatic behaviour in a CTCS environment,
making CTCS-HRRL a promising framework for modellng animal dynamics and
decision-making.
( 2
min )
This paper studies the effect of adding geometrically smoothed momentum to
the randomized Kaczmarz algorithm, which is an instance of stochastic gradient
descent on a linear least squares loss function. We prove a result about the
expected error in the direction of singular vectors of the matrix defining the
least squares loss. We present several numerical examples illustrating the
utility of our result and pose several questions.
( 2
min )
We study a streamable attention-based encoder-decoder model in which either
the decoder, or both the encoder and decoder, operate on pre-defined,
fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances
from one chunk to the next chunk, effectively replacing the conventional
end-of-sequence symbol. This modification, while minor, situates our model as
equivalent to a transducer model that operates on chunks instead of frames,
where EOC corresponds to the blank symbol. We further explore the remaining
differences between a standard transducer and our model. Additionally, we
examine relevant aspects such as long-form speech generalization, beam size,
and length normalization. Through experiments on Librispeech and TED-LIUM-v2,
and by concatenating consecutive sequences for long-form trials, we find that
our streamable model maintains competitive performance compared to the
non-streamable variant and generalizes very well to long-form speech.
( 2
min )
We introduce a novel capacity measure 2sED for statistical models based on
the effective dimension. The new quantity provably bounds the generalization
error under mild assumptions on the model. Furthermore, simulations on standard
data sets and popular model architectures show that 2sED correlates well with
the training error. For Markovian models, we show how to efficiently
approximate 2sED from below through a layerwise iterative approach, which
allows us to tackle deep learning models with a large number of parameters.
Simulation results suggest that the approximation is good for different
prominent models and data sets.
( 2
min )
Despite the success of deep learning-based algorithms, it is widely known
that neural networks may fail to be robust. A popular paradigm to enforce
robustness is adversarial training (AT), however, this introduces many
computational and theoretical difficulties. Recent works have developed a
connection between AT in the multiclass classification setting and
multimarginal optimal transport (MOT), unlocking a new set of tools to study
this problem. In this paper, we leverage the MOT connection to propose
computationally tractable numerical algorithms for computing universal lower
bounds on the optimal adversarial risk and identifying optimal classifiers. We
propose two main algorithms based on linear programming (LP) and entropic
regularization (Sinkhorn). Our key insight is that one can harmlessly truncate
the higher order interactions between classes, preventing the combinatorial run
times typically encountered in MOT problems. We validate these results with
experiments on MNIST and CIFAR-$10$, which demonstrate the tractability of our
approach.
( 2
min )
Motivated by the entropic optimal transport problem in unbounded settings, we
study versions of Hilbert's projective metric for spaces of integrable
functions of bounded growth. These versions of Hilbert's metric originate from
cones which are relaxations of the cone of all non-negative functions, in the
sense that they include all functions having non-negative integral values when
multiplied with certain test functions. We show that kernel integral operators
are contractions with respect to suitable specifications of such metrics even
for kernels which are not bounded away from zero, provided that the decay to
zero of the kernel is controlled. As an application to entropic optimal
transport, we show exponential convergence of Sinkhorn's algorithm in settings
where the marginal distributions have sufficiently light tails compared to the
growth of the cost function.
( 2
min )
In the realm of machine learning and statistical modeling, practitioners
often work under the assumption of accessible, static, labeled data for
evaluation and training. However, this assumption often deviates from reality
where data may be private, encrypted, difficult- to-measure, or unlabeled. In
this paper, we bridge this gap by adapting the Hui-Walter paradigm, a method
traditionally applied in epidemiology and medicine, to the field of machine
learning. This approach enables us to estimate key performance metrics such as
false positive rate, false negative rate, and priors in scenarios where no
ground truth is available. We further extend this paradigm for handling online
data, opening up new possibilities for dynamic data environments. Our
methodology involves partitioning data into latent classes to simulate multiple
data populations (if natural populations are unavailable) and independently
training models to replicate multiple tests. By cross-tabulating binary
outcomes across ensemble categorizers and multiple populations, we are able to
estimate unknown parameters through Gibbs sampling, eliminating the need for
ground-truth or labeled data. This paper showcases the potential of our
methodology to transform machine learning practices by allowing for accurate
model assessment under dynamic and uncertain data conditions.
( 2
min )
Two-timescale stochastic approximation (TTSA) is among the most general
frameworks for iterative stochastic algorithms. This includes well-known
stochastic optimization methods such as SGD variants and those designed for
bilevel or minimax problems, as well as reinforcement learning like the family
of gradient-based temporal difference (GTD) algorithms. In this paper, we
conduct an in-depth asymptotic analysis of TTSA under controlled Markovian
noise via central limit theorem (CLT), uncovering the coupled dynamics of TTSA
influenced by the underlying Markov chain, which has not been addressed by
previous CLT results of TTSA only with Martingale difference noise. Building
upon our CLT, we expand its application horizon of efficient sampling
strategies from vanilla SGD to a wider TTSA context in distributed learning,
thus broadening the scope of Hu et al. (2022). In addition, we leverage our CLT
result to deduce the statistical properties of GTD algorithms with nonlinear
function approximation using Markovian samples and show their identical
asymptotic performance, a perspective not evident from current finite-time
bounds.
( 2
min )
We present a new representation learning framework, Intensity Profile
Projection, for continuous-time dynamic network data. Given triples $(i,j,t)$,
each representing a time-stamped ($t$) interaction between two entities
($i,j$), our procedure returns a continuous-time trajectory for each node,
representing its behaviour over time. The framework consists of three stages:
estimating pairwise intensity functions, e.g. via kernel smoothing; learning a
projection which minimises a notion of intensity reconstruction error; and
constructing evolving node representations via the learned projection. The
trajectories satisfy two properties, known as structural and temporal
coherence, which we see as fundamental for reliable inference. Moreoever, we
develop estimation theory providing tight control on the error of any estimated
trajectory, indicating that the representations could even be used in quite
noise-sensitive follow-on analyses. The theory also elucidates the role of
smoothing as a bias-variance trade-off, and shows how we can reduce the level
of smoothing as the signal-to-noise ratio increases on account of the algorithm
`borrowing strength' across the network.
( 2
min )
MIT CSAIL researchers develop advanced machine-learning models that outperform current methods in detecting pancreatic ductal adenocarcinoma.
( 9
min )
PhD students interning with the MIT-IBM Watson AI Lab look to improve natural language usage.
( 10
min )
Mark Swinnerton aims to fight climate change by transforming abandoned mines into storage tanks of renewable energy. The CEO of startup Green Gravity is prototyping his ambitious vision in a warehouse 60 miles south of Sydney, Australia, and simulating it in NVIDIA Omniverse, a platform for building 3D workflows and applications. The concept requires some Read article >
( 6
min )
Hold on to your seats — this GFN Thursday is unleashing dinosaurs, crowns and more in the cloud. Catch it all on Capcom’s Exoprimal and Ubisoft’s Prince of Persia: The Lost Crown, leading 10 new games joining the GeForce NOW library this week. Suit Up, Adapt, Survive Don cutting-edge exosuit technology and battle ferocious dinosaurs Read article >
( 6
min )
In this work, we study the deep signature algorithms for path-dependent
options. We extend the backward scheme in [Hur\'e-Pham-Warin. Mathematics of
Computation 89, no. 324 (2020)] for state-dependent FBSDEs with reflections to
path-dependent FBSDEs with reflections, by adding the signature layer to the
backward scheme. Our algorithm applies to both European and American type
option pricing problems while the payoff function depends on the whole paths of
the underlying forward stock process. We prove the convergence analysis of our
numerical algorithm with explicit dependence on the truncation order of the
signature and the neural network approximation errors. Numerical examples for
the algorithm are provided including: Amerasian option under the Black-Scholes
model, American option with a path-dependent geometric mean payoff function,
and the Shiryaev's optimal stopping problem.
( 2
min )
Popular guidance for denoising diffusion probabilistic model (DDPM) linearly
combines distinct conditional models together to provide enhanced control over
samples. However, this approach overlooks nonlinear effects that become
significant when guidance scale is large. To address this issue, we propose
characteristic guidance, a sampling method that provides first-principle
non-linear correction for classifier-free guided DDPMs. Such correction forces
the guided DDPMs to respect the Fokker-Planck equation of their underlying
diffusion process, in a way that is training-free, derivative-free, and
compatible with existing sampling methods. Experiments show that characteristic
guidance enhances control and reduces color and exposure issues in image
generation, proving effective in diverse applications ranging from latent space
sampling to solving physics problems like magnet phase transitions.
( 2
min )
Because of its privacy-preserving capability, federated learning (FL) has
attracted significant attention from both academia and industry. However, when
being implemented over wireless networks, it is not clear how much
communication error can be tolerated by FL. This paper investigates the
robustness of FL to the uplink and downlink communication error. Our
theoretical analysis reveals that the robustness depends on two critical
parameters, namely the number of clients and the numerical range of model
parameters. It is also shown that the uplink communication in FL can tolerate a
higher bit error rate (BER) than downlink communication, and this difference is
quantified by a proposed formula. The findings and theoretical analyses are
further validated by extensive experiments.
( 2
min )
Algorithmic generalization in machine learning refers to the ability to learn
the underlying algorithm that generates data in a way that generalizes
out-of-distribution. This is generally considered a difficult task for most
machine learning algorithms. Here, we analyze algorithmic generalization when
counting is required, either implicitly or explicitly. We show that standard
Transformers are based on architectural decisions that hinder
out-of-distribution performance for such tasks. In particular, we discuss the
consequences of using layer normalization and of normalizing the attention
weights via softmax. With ablation of the problematic operations, we
demonstrate that a modified transformer can exhibit a good algorithmic
generalization performance on counting while using a very lightweight
architecture.
( 2
min )
In this paper, we propose a method for knowledge graph construction in power
distribution networks. This method leverages entity features, which involve
their semantic, phonetic, and syntactic characteristics, in both the knowledge
graph of distribution network and the dispatching texts. An enhanced model
based on Convolutional Neural Network, is utilized for effectively matching
dispatch text entities with those in the knowledge graph. The effectiveness of
this model is evaluated through experiments in real-world power distribution
dispatch scenarios. The results indicate that, compared with the baselines, the
proposed model excels in linking a variety of entity types, demonstrating high
overall accuracy in power distribution knowledge graph construction task.
( 2
min )
Understanding model's sensitivity to its training data is crucial but can
also be challenging and costly, especially during training. To simplify such
issues, we present the Memory-Perturbation Equation (MPE) which relates model's
sensitivity to perturbation in its training data. Derived using Bayesian
principles, the MPE unifies existing sensitivity measures, generalizes them to
a wide-variety of models and algorithms, and unravels useful properties
regarding sensitivities. Our empirical results show that sensitivity estimates
obtained during training can be used to faithfully predict generalization on
unseen test data. The proposed equation is expected to be useful for future
research on robust and adaptive learning.
( 2
min )
Accelerating compute intensive non-real-time beam-forming algorithms in
ultrasound imaging using deep learning architectures has been gaining momentum
in the recent past. Nonetheless, the complexity of the state-of-the-art deep
learning techniques poses challenges for deployment on resource-constrained
edge devices. In this work, we propose a novel vision transformer based tiny
beamformer (Tiny-VBF), which works on the raw radio-frequency channel data
acquired through single-angle plane wave insonification. The output of our
Tiny-VBF provides fast envelope detection requiring very low frame rate, i.e.
0.34 GOPs/Frame for a frame size of 368 x 128 in comparison to the
state-of-the-art deep learning models. It also exhibited an 8% increase in
contrast and gains of 5% and 33% in axial and lateral resolution respectively
when compared to Tiny-CNN on in-vitro dataset. Additionally, our model showed a
4.2% increase in contrast and gains of 4% and 20% in axial and lateral
resolution respectively when compared against conventional Delay-and-Sum (DAS)
beamformer. We further propose an accelerator architecture and implement our
Tiny-VBF model on a Zynq UltraScale+ MPSoC ZCU104 FPGA using a hybrid
quantization scheme with 50% less resource consumption compared to the
floating-point implementation, while preserving the image quality.
( 2
min )
Partial differential equations are often used in the spatial-temporal
modeling of complex dynamical systems in many engineering applications. In this
work, we build on the recent progress of operator learning and present a
data-driven modeling framework that is continuous in both space and time. A key
feature of the proposed model is the resolution-invariance with respect to both
spatial and temporal discretizations, without demanding abundant training data
in different temporal resolutions. To improve the long-term performance of the
calibrated model, we further propose a hybrid optimization scheme that
leverages both gradient-based and derivative-free optimization methods and
efficiently trains on both short-term time series and long-term statistics. We
investigate the performance of the spatial-temporal continuous learning
framework with three numerical examples, including the viscous Burgers'
equation, the Navier-Stokes equations, and the Kuramoto-Sivashinsky equation.
The results confirm the resolution-invariance of the proposed modeling
framework and also demonstrate stable long-term simulations with only
short-term time series data. In addition, we show that the proposed model can
better predict long-term statistics via the hybrid optimization scheme with a
combined use of short-term and long-term data.
( 2
min )
Machine learning (ML) applications in medical artificial intelligence (AI)
systems have shifted from traditional and statistical methods to increasing
application of deep learning models. This survey navigates the current
landscape of multimodal ML, focusing on its profound impact on medical image
analysis and clinical decision support systems. Emphasizing challenges and
innovations in addressing multimodal representation, fusion, translation,
alignment, and co-learning, the paper explores the transformative potential of
multimodal models for clinical predictions. It also questions practical
implementation of such models, bringing attention to the dynamics between
decision support systems and healthcare providers. Despite advancements,
challenges such as data biases and the scarcity of "big data" in many
biomedical domains persist. We conclude with a discussion on effective
innovation and collaborative efforts to further the miss
( 2
min )
This article studies how to intervene against statistical discrimination,
when it is based on beliefs generated by machine learning, rather than by
humans. Unlike beliefs formed by a human mind, machine learning-generated
beliefs are verifiable. This allows interventions to move beyond simple,
belief-free designs like affirmative action, to more sophisticated ones, that
constrain decision makers in ways that depend on what they are thinking. Such
mind reading interventions can perform well where affirmative action does not,
even when the beliefs being conditioned on are possibly incorrect and biased.
( 2
min )
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL
-- have seen recent use as a way to express non-Markovian objectives in
reinforcement learning. We introduce a model-based probably approximately
correct (PAC) learning algorithm for omega-regular objectives in Markov
decision processes (MDPs). As part of the development of our algorithm, we
introduce the epsilon-recurrence time: a measure of the speed at which a policy
converges to the satisfaction of the omega-regular objective in the limit. We
prove that our algorithm only requires a polynomial number of samples in the
relevant parameters, and perform experiments which confirm our theory.
( 2
min )
My research investigates the use of cutting-edge hybrid deep learning models
to accurately differentiate between AI-generated text and human writing. I
applied a robust methodology, utilising a carefully selected dataset comprising
AI and human texts from various sources, each tagged with instructions.
Advanced natural language processing techniques facilitated the analysis of
textual features. Combining sophisticated neural networks, the custom model
enabled it to detect nuanced differences between AI and human content.
( 2
min )
The correlation between the sharpness of loss minima and generalisation in
the context of deep neural networks has been subject to discussion for a long
time. Whilst mostly investigated in the context of selected benchmark data sets
in the area of computer vision, we explore this aspect for the acoustic scene
classification task of the DCASE2020 challenge data. Our analysis is based on
two-dimensional filter-normalised visualisations and a derived sharpness
measure. Our exploratory analysis shows that sharper minima tend to show better
generalisation than flat minima -even more so for out-of-domain data, recorded
from previously unseen devices-, thus adding to the dispute about better
generalisation capabilities of flat minima. We further find that, in
particular, the choice of optimisers is a main driver of the sharpness of
minima and we discuss resulting limitations with respect to comparability. Our
code, trained model states and loss landscape visualisations are publicly
available.
( 2
min )
Traditional data-driven deep learning models often struggle with high
training costs, error accumulation, and poor generalizability in complex
physical processes. Physics-informed deep learning (PiDL) addresses these
challenges by incorporating physical principles into the model. Most PiDL
approaches regularize training by embedding governing equations into the loss
function, yet this depends heavily on extensive hyperparameter tuning to weigh
each loss term. To this end, we propose to leverage physics prior knowledge by
``baking'' the discretized governing equations into the neural network
architecture via the connection between the partial differential equations
(PDE) operators and network structures, resulting in a PDE-preserved neural
network (PPNN). This method, embedding discretized PDEs through convolutional
residual networks in a multi-resolution setting, largely improves the
generalizability and long-term prediction accuracy, outperforming conventional
black-box models. The effectiveness and merit of the proposed methods have been
demonstrated across various spatiotemporal dynamical systems governed by
spatiotemporal PDEs, including reaction-diffusion, Burgers', and Navier-Stokes
equations.
( 2
min )
We introduce SPIRAL, a SuPerlinearly convergent Incremental pRoximal
ALgorithm, for solving nonconvex regularized finite sum problems under a
relative smoothness assumption. Each iteration of SPIRAL consists of an inner
and an outer loop. It combines incremental gradient updates with a linesearch
that has the remarkable property of never being triggered asymptotically,
leading to superlinear convergence under mild assumptions at the limit point.
Simulation results with L-BFGS directions on different convex, nonconvex, and
non-Lipschitz differentiable problems show that our algorithm, as well as its
adaptive variant, are competitive to the state of the art.
( 2
min )
The singular subspaces perturbation theory is of fundamental importance in
probability and statistics. It has various applications across different
fields. We consider two arbitrary matrices where one is a leave-one-column-out
submatrix of the other one and establish a novel perturbation upper bound for
the distance between the two corresponding singular subspaces. It is
well-suited for mixture models and results in a sharper and finer statistical
analysis than classical perturbation bounds such as Wedin's Theorem. Empowered
by this leave-one-out perturbation theory, we provide a deterministic entrywise
analysis for the performance of spectral clustering under mixture models. Our
analysis leads to an explicit exponential error rate for spectral clustering of
sub-Gaussian mixture models. For the mixture of isotropic Gaussians, the rate
is optimal under a weaker signal-to-noise condition than that of L{\"o}ffler et
al. (2021).
( 2
min )
Sequential recommendation models, models that learn from chronological
user-item interactions, outperform traditional recommendation models in many
settings. Despite the success of sequential recommendation models, their
robustness has recently come into question. Two properties unique to the nature
of sequential recommendation models may impair their robustness - the cascade
effects induced during training and the model's tendency to rely too heavily on
temporal information. To address these vulnerabilities, we propose
Cascade-guided Adversarial training, a new adversarial training procedure that
is specifically designed for sequential recommendation models. Our approach
harnesses the intrinsic cascade effects present in sequential modeling to
produce strategic adversarial perturbations to item embeddings during training.
Experiments on training state-of-the-art sequential models on four public
datasets from different domains show that our training approach produces
superior model ranking accuracy and superior model robustness to real item
replacement perturbations when compared to both standard model training and
generic adversarial training.
( 2
min )
The introduction of computerized medical records in hospitals has reduced
burdensome activities like manual writing and information fetching. However,
the data contained in medical records are still far underutilized, primarily
because extracting data from unstructured textual medical records takes time
and effort. Information Extraction, a subfield of Natural Language Processing,
can help clinical practitioners overcome this limitation by using automated
text-mining pipelines. In this work, we created the first Italian
neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to
develop a Transformers-based model. Moreover, we collected and leveraged three
external independent datasets to implement an effective multicenter model, with
overall F1-score 84.77%, Precision 83.16%, Recall 86.44%. The lessons learned
are: (i) the crucial role of a consistent annotation process and (ii) a
fine-tuning strategy that combines classical methods with a "low-resource"
approach. This allowed us to establish methodological guidelines that pave the
way for Natural Language Processing studies in less-resourced languages.
( 3
min )
MCMC algorithms offer empirically efficient tools for sampling from a target
distribution $\pi(x) \propto \exp(-V(x))$. However, on the theory side, MCMC
algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave. Our
work examines this gap and shows that when Poincar\'e-style inequality holds on
a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC
iterates over $\mathcal{X}$ mixes fast to the true conditional distribution.
This fast mixing guarantee can hold in cases when global mixing is provably
slow. We formalize the statement and quantify the conditional mixing rate. We
further show that conditional mixing can have interesting implications for
sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture
models and Gibbs-sampling with well-connected local minima.
( 2
min )
We study gradient descent under linearly correlated noise. Our work is
motivated by recent practical methods for optimization with differential
privacy (DP), such as DP-FTRL, which achieve strong performance in settings
where privacy amplification techniques are infeasible (such as in federated
learning). These methods inject privacy noise through a matrix factorization
mechanism, making the noise linearly correlated over iterations. We propose a
simplified setting that distills key facets of these methods and isolates the
impact of linearly correlated noise. We analyze the behavior of gradient
descent in this setting, for both convex and non-convex functions. Our analysis
is demonstrably tighter than prior work and recovers multiple important special
cases exactly (including anticorrelated perturbed gradient descent). We use our
results to develop new, effective matrix factorizations for differentially
private optimization, and highlight the benefits of these factorizations
theoretically and empirically.
( 2
min )
Recurrent neural networks (RNNs) are a class of neural networks that have
emerged from the paradigm of artificial intelligence and has enabled lots of
interesting advances in the field of natural language processing.
Interestingly, these architectures were shown to be powerful ansatze to
approximate the ground state of quantum systems. Here, we build over the
results of [Phys. Rev. Research 2, 023358 (2020)] and construct a more powerful
RNN wave function ansatz in two dimensions. We use symmetry and annealing to
obtain accurate estimates of ground state energies of the two-dimensional (2D)
Heisenberg model, on the square lattice and on the triangular lattice. We show
that our method is superior to Density Matrix Renormalisation Group (DMRG) for
system sizes larger than or equal to $14 \times 14$ on the triangular lattice.
( 2
min )
We study the convergence of stochastic gradient descent (SGD) for non-convex
objective functions. We establish the local convergence with positive
probability under the local \L{}ojasiewicz condition introduced by Chatterjee
in \cite{chatterjee2022convergence} and an additional local structural
assumption of the loss function landscape. A key component of our proof is to
ensure that the whole trajectories of SGD stay inside the local region with a
positive probability. We also provide examples of neural networks with finite
widths such that our assumptions hold.
( 2
min )
In industry deep learning application, our manually labeled data has a
certain number of noisy data. To solve this problem and achieve more than 90
score in dev dataset, we present a simple method to find the noisy data and
re-label the noisy data by human, given the model predictions as references in
human labeling. In this paper, we illustrate our idea for a broad set of deep
learning tasks, includes classification, sequence tagging, object detection,
sequence generation, click-through rate prediction. The dev dataset evaluation
results and human evaluation results verify our idea.
( 2
min )
We examine the relationship between the mutual information between the output
model and the empirical sample and the generalization of the algorithm in the
context of stochastic convex optimization. Despite increasing interest in
information-theoretic generalization bounds, it is uncertain if these bounds
can provide insight into the exceptional performance of various learning
algorithms. Our study of stochastic convex optimization reveals that, for true
risk minimization, dimension-dependent mutual information is necessary. This
indicates that existing information-theoretic generalization bounds fall short
in capturing the generalization capabilities of algorithms like SGD and
regularized ERM, which have dimension-independent sample complexity.
( 2
min )
Despite the significant progress made by transformer models in machine
reading comprehension tasks, they still fall short in handling complex
reasoning tasks due to the absence of explicit knowledge in the input sequence.
To address this limitation, many recent works have proposed injecting external
knowledge into the model. However, selecting relevant external knowledge,
ensuring its availability, and requiring additional processing steps remain
challenging. In this paper, we introduce a novel attention pattern that
integrates reasoning knowledge derived from a heterogeneous graph into the
transformer architecture without relying on external knowledge. The proposed
attention pattern comprises three key elements: global-local attention for word
tokens, graph attention for entity tokens that exhibit strong attention towards
tokens connected in the graph as opposed to those unconnected, and the
consideration of the type of relationship between each entity token and word
token. This results in optimized attention between the two if a relationship
exists. The pattern is coupled with special relative position labels, allowing
it to integrate with LUKE's entity-aware self-attention mechanism. The
experimental findings corroborate that our model outperforms both the
cutting-edge LUKE-Graph and the baseline LUKE model across two distinct
datasets: ReCoRD, emphasizing commonsense reasoning, and WikiHop, focusing on
multi-hop reasoning challenges.
( 3
min )
Characters do not convey meaning, but sequences of characters do. We propose
an unsupervised distributional method to learn the abstract meaningful units in
a sequence of characters. Rather than segmenting the sequence, our Dynamic
Capacity Slot Attention model discovers continuous representations of the
objects in the sequence, extending an architecture for object discovery in
images. We train our model on different languages and evaluate the quality of
the obtained representations with forward and reverse probing classifiers.
These experiments show that our model succeeds in discovering units which are
similar to those proposed previously in form, content and level of abstraction,
and which show promise for capturing meaningful information at a higher level
of abstraction.
( 2
min )
Recent years have seen rapid development of descriptor generation based on
representation learning of extremely diverse molecules, especially those that
apply natural language processing (NLP) models to SMILES, a literal
representation of molecular structure. However, little research has been done
on how these models understand chemical structure. To address this black box,
we investigated the relationship between the learning progress of SMILES and
chemical structure using a representative NLP model, the Transformer. We show
that while the Transformer learns partial structures of molecules quickly, it
requires extended training to understand overall structures. Consistently, the
accuracy of molecular property predictions using descriptors generated from
models at different learning steps was similar from the beginning to the end of
training. Furthermore, we found that the Transformer requires particularly long
training to learn chirality and sometimes stagnates with low performance due to
misunderstanding of enantiomers. These findings are expected to deepen the
understanding of NLP models in chemistry.
( 2
min )
Photovoltaic (PV) power generation has emerged as one of the lead renewable
energy sources. Yet, its production is characterized by high uncertainty, being
dependent on weather conditions like solar irradiance and temperature.
Predicting PV production, even in the 24-hour forecast, remains a challenge and
leads energy providers to left idling - often carbon emitting - plants. In this
paper, we introduce a Long-Term Recurrent Convolutional Network using Numerical
Weather Predictions (NWP) to predict, in turn, PV production in the 24-hour and
48-hour forecast horizons. This network architecture fully leverages both
temporal and spatial weather data, sampled over the whole geographical area of
interest. We train our model on an NWP dataset from the National Oceanic and
Atmospheric Administration (NOAA) to predict spatially aggregated PV production
in Germany. We compare its performance to the persistence model and
state-of-the-art methods.
( 2
min )
The use of mini-batches of data in training artificial neural networks is
nowadays very common. Despite its broad usage, theories explaining
quantitatively how large or small the optimal mini-batch size should be are
missing. This work presents a systematic attempt at understanding the role of
the mini-batch size in training two-layer neural networks. Working in the
teacher-student scenario, with a sparse teacher, and focusing on tasks of
different complexity, we quantify the effects of changing the mini-batch size
$m$. We find that often the generalization performances of the student strongly
depend on $m$ and may undergo sharp phase transitions at a critical value
$m_c$, such that for $mm_c$ the
student learns perfectly or generalizes very well the teacher. Phase
transitions are induced by collective phenomena firstly discovered in
statistical mechanics and later observed in many fields of science. Observing a
phase transition by varying the mini-batch size across different architectures
raises several questions about the role of this hyperparameter in the neural
network learning process.
( 3
min )
This paper presents a comprehensive comparative analysis of the performance
of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks
(QNN), juxtaposed against their classical counterparts: Equivariant Neural
Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of
each network with two toy examples for a binary classification task, focusing
on model complexity (measured by the number of parameters) and the size of the
training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$
EQNN and the QNN provide superior performance for smaller parameter sets and
modest training data samples.
( 2
min )
Global optimization of decision trees has shown to be promising in terms of
accuracy, size, and consequently human comprehensibility. However, many of the
methods used rely on general-purpose solvers for which scalability remains an
issue. Dynamic programming methods have been shown to scale much better because
they exploit the tree structure by solving subtrees as independent subproblems.
However, this only works when an objective can be optimized separately for
subtrees. We explore this relationship in detail and show the necessary and
sufficient conditions for such separability and generalize previous dynamic
programming approaches into a framework that can optimize any combination of
separable objectives and constraints. Experiments on five application domains
show the general applicability of this framework, while outperforming the
scalability of general-purpose solvers by a large margin.
( 2
min )
We propose Pgx, a suite of board game reinforcement learning (RL)
environments written in JAX and optimized for GPU/TPU accelerators. By
leveraging JAX's auto-vectorization and parallelization over accelerators, Pgx
can efficiently scale to thousands of simultaneous simulations over
accelerators. In our experiments on a DGX-A100 workstation, we discovered that
Pgx can simulate RL environments 10-100x faster than existing implementations
available in Python. Pgx includes RL environments commonly used as benchmarks
in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx
offers miniature game sets and baseline models to facilitate rapid research
cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm
with Pgx environments. Overall, Pgx provides high-performance environment
simulators for researchers to accelerate their RL experiments. Pgx is available
at this http URL
( 2
min )
This paper presents Translatotron 3, a novel approach to unsupervised direct
speech-to-speech translation from monolingual speech-text datasets by combining
masked autoencoder, unsupervised embedding mapping, and back-translation.
Experimental results in speech-to-speech translation tasks between Spanish and
English show that Translatotron 3 outperforms a baseline cascade system,
reporting $18.14$ BLEU points improvement on the synthesized
Unpaired-Conversational dataset. In contrast to supervised approaches that
necessitate real paired data, or specialized modeling to replicate
para-/non-linguistic information such as pauses, speaking rates, and speaker
identity, Translatotron 3 showcases its capability to retain it. Audio samples
can be found at this http URL
( 2
min )
In this paper, we investigate the complexity of feed-forward neural networks
by examining the concept of functional equivalence, which suggests that
different network parameterizations can lead to the same function. We utilize
the permutation invariance property to derive a novel covering number bound for
the class of feedforward neural networks, which reveals that the complexity of
a neural network can be reduced by exploiting this property. We discuss the
extensions to convolutional neural networks, residual networks, and
attention-based models. We demonstrate that functional equivalence benefits
optimization, as overparameterized networks tend to be easier to train since
increasing network width leads to a diminishing volume of the effective
parameter space. Our findings offer new insights into overparameterization and
have significant implications for understanding generalization and optimization
in deep learning.
( 2
min )
The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance
reduced variant of the Stochastic Gradient Descent (SGD) algorithm that needs a
gradient of the objective function from time to time. In this paper, we remove
the necessity of a full gradient computation. This is achieved by using a
randomized reshuffling strategy and aggregating stochastic gradients obtained
in each epoch. The aggregated stochastic gradients serve as an estimate of a
full gradient in the SARAH algorithm. We provide a theoretical analysis of the
proposed approach and conclude the paper with numerical experiments that
demonstrate the efficiency of this approach.
( 2
min )
We consider information-theoretic bounds on expected generalization error for
statistical learning problems in a networked setting. In this setting, there
are $K$ nodes, each with its own independent dataset, and the models from each
node have to be aggregated into a final centralized model. We consider both
simple averaging of the models as well as more complicated multi-round
algorithms. We give upper bounds on the expected generalization error for a
variety of problems, such as those with Bregman divergence or Lipschitz
continuous losses, that demonstrate an improved dependence of $1/K$ on the
number of nodes. These "per node" bounds are in terms of the mutual information
between the training dataset and the trained weights at each node, and are
therefore useful in describing the generalization properties inherent to having
communication or privacy constraints at each node.
( 2
min )
Anomaly detection is a challenging task for machine learning algorithms due
to the inherent class imbalance. It is costly and time-demanding to manually
analyse the observed data, thus usually only few known anomalies if any are
available. Inspired by generative models and the analysis of the hidden
activations of neural networks, we introduce a novel unsupervised anomaly
detection method called DA3D. Here, we use adversarial autoencoders to generate
anomalous counterexamples based on the normal data only. These artificial
anomalies used during training allow the detection of real, yet unseen
anomalies. With our novel generative approach, we transform the unsupervised
task of anomaly detection to a supervised one, which is more tractable by
machine learning and especially deep learning methods. DA3D surpasses the
performance of state-of-the-art anomaly detection methods in a purely
data-driven way, where no domain knowledge is required.
( 2
min )
We introduce a new family of neural network models called Convolutional
Dynamic Alignment Networks (CoDA-Nets), which are performant classifiers with a
high degree of inherent interpretability. Their core building blocks are
Dynamic Alignment Units (DAUs), which linearly transform their input with
weight vectors that dynamically align with task-relevant patterns. As a result,
CoDA-Nets model the classification prediction through a series of
input-dependent linear transformations, allowing for linear decomposition of
the output into individual input contributions. Given the alignment of the
DAUs, the resulting contribution maps align with discriminative input patterns.
These model-inherent decompositions are of high visual quality and outperform
existing attribution methods under quantitative metrics. Further, CoDA-Nets
constitute performant classifiers, achieving on par results to ResNet and VGG
models on e.g. CIFAR-10 and TinyImagenet.
( 2
min )
Voxel-based multiple testing is widely used in neuroimaging data analysis.
Traditional false discovery rate (FDR) control methods often ignore the spatial
dependence among the voxel-based tests and thus suffer from substantial loss of
testing power. While recent spatial FDR control methods have emerged, their
validity and optimality remain questionable when handling the complex spatial
dependencies of the brain. Concurrently, deep learning methods have
revolutionized image segmentation, a task closely related to voxel-based
multiple testing. In this paper, we propose DeepFDR, a novel spatial FDR
control method that leverages unsupervised deep learning-based image
segmentation to address the voxel-based multiple testing problem. Numerical
studies, including comprehensive simulations and Alzheimer's disease FDG-PET
image analysis, demonstrate DeepFDR's superiority over existing methods.
DeepFDR not only excels in FDR control and effectively diminishes the false
nondiscovery rate, but also boasts exceptional computational efficiency highly
suited for tackling large-scale neuroimaging data.
( 2
min )
A default assumption in reinforcement learning (RL) and optimal control is
that observations arrive at discrete time points on a fixed clock cycle. Yet,
many applications involve continuous-time systems where the time
discretization, in principle, can be managed. The impact of time discretization
on RL methods has not been fully characterized in existing theory, but a more
detailed analysis of its effect could reveal opportunities for improving
data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation
for LQR systems and uncover a fundamental trade-off between approximation and
statistical error in value estimation. Importantly, these two errors behave
differently to time discretization, leading to an optimal choice of temporal
resolution for a given data budget. These findings show that managing the
temporal resolution can provably improve policy evaluation efficiency in LQR
systems with finite data. Empirically, we demonstrate the trade-off in
numerical simulations of LQR instances and standard RL benchmarks for
non-linear continuous control.
( 2
min )
Efficient training of large-scale graph neural networks (GNNs) has been
studied with a specific focus on reducing their memory consumption. Work by Liu
et al. (2022) proposed extreme activation compression (EXACT) which
demonstrated drastic reduction in memory consumption by performing quantization
of the intermediate activation maps down to using INT2 precision. They showed
little to no reduction in performance while achieving large reductions in GPU
memory consumption. In this work, we present an improvement to the EXACT
strategy by using block-wise quantization of the intermediate activation maps.
We experimentally analyze different block sizes and show further reduction in
memory consumption (>15%), and runtime speedup per epoch (about 5%) even when
performing extreme extents of quantization with similar performance trade-offs
as with the original EXACT. Further, we present a correction to the assumptions
on the distribution of intermediate activation maps in EXACT (assumed to be
uniform) and show improved variance estimations of the quantization and
dequantization steps.
( 2
min )
Transfer learning and ensembling are two popular techniques for improving the
performance and robustness of neural networks. Due to the high cost of
pre-training, ensembles of models fine-tuned from a single pre-trained
checkpoint are often used in practice. Such models end up in the same basin of
the loss landscape, which we call the pre-train basin, and thus have limited
diversity. In this work, we show that ensembles trained from a single
pre-trained checkpoint may be improved by better exploring the pre-train basin,
however, leaving the basin results in losing the benefits of transfer learning
and in degradation of the ensemble quality. Based on the analysis of existing
exploration methods, we propose a more effective modification of the Snapshot
Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger
ensembles and uniform model soups.
( 2
min )
This paper proposes a new easy-to-implement parameter-free gradient-based
optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is
efficient -- matching the convergence rate of optimally tuned gradient descent
in convex optimization up to a logarithmic factor without tuning any
parameters, and universal -- automatically adapting to both smooth and
nonsmooth problems. While popular algorithms following the AdaGrad framework
compute a running average of the squared gradients to use for normalization,
DoWG maintains a new distance-based weighted version of the running average,
which is crucial to achieve the desired properties. To complement our theory,
we also show empirically that DoWG trains at the edge of stability, and
validate its effectiveness on practical machine learning tasks.
( 2
min )
Understanding model's sensitivity to its training data is crucial but can
also be challenging and costly, especially during training. To simplify such
issues, we present the Memory-Perturbation Equation (MPE) which relates model's
sensitivity to perturbation in its training data. Derived using Bayesian
principles, the MPE unifies existing sensitivity measures, generalizes them to
a wide-variety of models and algorithms, and unravels useful properties
regarding sensitivities. Our empirical results show that sensitivity estimates
obtained during training can be used to faithfully predict generalization on
unseen test data. The proposed equation is expected to be useful for future
research on robust and adaptive learning.
( 2
min )
This paper presents a comprehensive comparative analysis of the performance
of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks
(QNN), juxtaposed against their classical counterparts: Equivariant Neural
Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of
each network with two toy examples for a binary classification task, focusing
on model complexity (measured by the number of parameters) and the size of the
training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$
EQNN and the QNN provide superior performance for smaller parameter sets and
modest training data samples.
( 2
min )
We revisit processes generated by iterated random functions driven by a
stationary and ergodic sequence. Such a process is called strongly stable if a
random initialization exists, for which the process is stationary and ergodic,
and for any other initialization, the difference of the two processes converges
to zero almost surely. Under some mild conditions on the corresponding
recursive map, without any condition on the driving sequence, we show the
strong stability of iterations. Several applications are surveyed such as
stochastic approximation and queuing. Furthermore, new results are deduced for
Langevin-type iterations with dependent noise and for multitype branching
processes.
( 2
min )
We examine the relationship between the mutual information between the output
model and the empirical sample and the generalization of the algorithm in the
context of stochastic convex optimization. Despite increasing interest in
information-theoretic generalization bounds, it is uncertain if these bounds
can provide insight into the exceptional performance of various learning
algorithms. Our study of stochastic convex optimization reveals that, for true
risk minimization, dimension-dependent mutual information is necessary. This
indicates that existing information-theoretic generalization bounds fall short
in capturing the generalization capabilities of algorithms like SGD and
regularized ERM, which have dimension-independent sample complexity.
( 2
min )
Voxel-based multiple testing is widely used in neuroimaging data analysis.
Traditional false discovery rate (FDR) control methods often ignore the spatial
dependence among the voxel-based tests and thus suffer from substantial loss of
testing power. While recent spatial FDR control methods have emerged, their
validity and optimality remain questionable when handling the complex spatial
dependencies of the brain. Concurrently, deep learning methods have
revolutionized image segmentation, a task closely related to voxel-based
multiple testing. In this paper, we propose DeepFDR, a novel spatial FDR
control method that leverages unsupervised deep learning-based image
segmentation to address the voxel-based multiple testing problem. Numerical
studies, including comprehensive simulations and Alzheimer's disease FDG-PET
image analysis, demonstrate DeepFDR's superiority over existing methods.
DeepFDR not only excels in FDR control and effectively diminishes the false
nondiscovery rate, but also boasts exceptional computational efficiency highly
suited for tackling large-scale neuroimaging data.
( 2
min )
A default assumption in reinforcement learning (RL) and optimal control is
that observations arrive at discrete time points on a fixed clock cycle. Yet,
many applications involve continuous-time systems where the time
discretization, in principle, can be managed. The impact of time discretization
on RL methods has not been fully characterized in existing theory, but a more
detailed analysis of its effect could reveal opportunities for improving
data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation
for LQR systems and uncover a fundamental trade-off between approximation and
statistical error in value estimation. Importantly, these two errors behave
differently to time discretization, leading to an optimal choice of temporal
resolution for a given data budget. These findings show that managing the
temporal resolution can provably improve policy evaluation efficiency in LQR
systems with finite data. Empirically, we demonstrate the trade-off in
numerical simulations of LQR instances and standard RL benchmarks for
non-linear continuous control.
( 2
min )
Transfer learning and ensembling are two popular techniques for improving the
performance and robustness of neural networks. Due to the high cost of
pre-training, ensembles of models fine-tuned from a single pre-trained
checkpoint are often used in practice. Such models end up in the same basin of
the loss landscape, which we call the pre-train basin, and thus have limited
diversity. In this work, we show that ensembles trained from a single
pre-trained checkpoint may be improved by better exploring the pre-train basin,
however, leaving the basin results in losing the benefits of transfer learning
and in degradation of the ensemble quality. Based on the analysis of existing
exploration methods, we propose a more effective modification of the Snapshot
Ensembles (SSE) for transfer learning setup, StarSSE, which results in stronger
ensembles and uniform model soups.
( 2
min )
Efficient training of large-scale graph neural networks (GNNs) has been
studied with a specific focus on reducing their memory consumption. Work by Liu
et al. (2022) proposed extreme activation compression (EXACT) which
demonstrated drastic reduction in memory consumption by performing quantization
of the intermediate activation maps down to using INT2 precision. They showed
little to no reduction in performance while achieving large reductions in GPU
memory consumption. In this work, we present an improvement to the EXACT
strategy by using block-wise quantization of the intermediate activation maps.
We experimentally analyze different block sizes and show further reduction in
memory consumption (>15%), and runtime speedup per epoch (about 5%) even when
performing extreme extents of quantization with similar performance trade-offs
as with the original EXACT. Further, we present a correction to the assumptions
on the distribution of intermediate activation maps in EXACT (assumed to be
uniform) and show improved variance estimations of the quantization and
dequantization steps.
( 2
min )
We develop a novel deep learning approach for pricing European basket options
written on assets that follow jump-diffusion dynamics. The option pricing
problem is formulated as a partial integro-differential equation, which is
approximated via a new implicit-explicit minimizing movement time-stepping
approach, involving approximation by deep, residual-type Artificial Neural
Networks (ANNs) for each time step. The integral operator is discretized via
two different approaches: a) a sparse-grid Gauss--Hermite approximation
following localised coordinate axes arising from singular value decompositions,
and b) an ANN-based high-dimensional special-purpose quadrature rule.
Crucially, the proposed ANN is constructed to ensure the asymptotic behavior of
the solution for large values of the underlyings and also leads to consistent
outputs with respect to a priori known qualitative properties of the solution.
The performance and robustness with respect to the dimension of the methods are
assessed in a series of numerical experiments involving the Merton
jump-diffusion model.
( 2
min )
We study the convergence of stochastic gradient descent (SGD) for non-convex
objective functions. We establish the local convergence with positive
probability under the local \L{}ojasiewicz condition introduced by Chatterjee
in \cite{chatterjee2022convergence} and an additional local structural
assumption of the loss function landscape. A key component of our proof is to
ensure that the whole trajectories of SGD stay inside the local region with a
positive probability. We also provide examples of neural networks with finite
widths such that our assumptions hold.
( 2
min )
Fixed point lattice actions are designed to have continuum classical
properties unaffected by discretization effects and reduced lattice artifacts
at the quantum level. They provide a possible way to extract continuum physics
with coarser lattices, thereby allowing to circumvent problems with critical
slowing down and topological freezing toward the continuum limit. A crucial
ingredient for practical applications is to find an accurate and compact
parametrization of a fixed point action, since many of its properties are only
implicitly defined. Here we use machine learning methods to revisit the
question of how to parametrize fixed point actions. In particular, we obtain a
fixed point action for four-dimensional SU(3) gauge theory using convolutional
neural networks with exact gauge invariance. The large operator space allows us
to find superior parametrizations compared to previous studies, a necessary
first step for future Monte Carlo simulations.
( 2
min )
This paper proposes a new easy-to-implement parameter-free gradient-based
optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is
efficient -- matching the convergence rate of optimally tuned gradient descent
in convex optimization up to a logarithmic factor without tuning any
parameters, and universal -- automatically adapting to both smooth and
nonsmooth problems. While popular algorithms following the AdaGrad framework
compute a running average of the squared gradients to use for normalization,
DoWG maintains a new distance-based weighted version of the running average,
which is crucial to achieve the desired properties. To complement our theory,
we also show empirically that DoWG trains at the edge of stability, and
validate its effectiveness on practical machine learning tasks.
( 2
min )
Generalized linear regressions, such as logistic regressions or Poisson
regressions, are long-studied regression analysis approaches, and their
applications are widely employed in various classification problems. Our study
considers a stochastic generalized linear regression model as a stochastic
problem with chance constraints and tackles it using nonconvex programming
techniques. Clustering techniques and quantile estimation are also used to
estimate random data's mean and variance-covariance matrix. Metrics for
measuring the performance of logistic regression are used to assess the model's
efficacy, including the F1 score, precision score, and recall score. The
results of the proposed algorithm were over 1 to 2 percent better than the
ordinary logistic regression model on the same dataset with the above
assessment criteria.
( 2
min )
Multi-view data arises frequently in modern network analysis e.g. relations
of multiple types among individuals in social network analysis, longitudinal
measurements of interactions among observational units, annotated networks with
noisy partial labeling of vertices etc. We study community detection in these
disparate settings via a unified theoretical framework, and investigate the
fundamental thresholds for community recovery. We characterize the mutual
information between the data and the latent parameters, provided the degrees
are sufficiently large. Based on this general result, (i) we derive a sharp
threshold for community detection in an inhomogeneous multilayer block model
\citep{chen2022global}, (ii) characterize a sharp threshold for weak recovery
in a dynamic stochastic block model \citep{matias2017statistical}, and (iii)
identify the limiting mutual information in an unbalanced partially labeled
block model. Our first two results are derived modulo coordinate-wise convexity
assumptions on specific functions -- we provide extensive numerical evidence
for their correctness. Finally, we introduce iterative algorithms based on
Approximate Message Passing for community detection in these problems.
( 2
min )
This paper proposes two methods for causal additive models with unobserved
variables (CAM-UV). CAM-UV assumes that the causal functions take the form of
generalized additive models and that latent confounders are present. First, we
propose a method that leverages prior knowledge for efficient causal discovery.
Then, we propose an extension of this method for inferring causality in time
series data. The original CAM-UV algorithm differs from other existing causal
function models in that it does not seek the causal order between observed
variables, but rather aims to identify the causes for each observed variable.
Therefore, the first proposed method in this paper utilizes prior knowledge,
such as understanding that certain variables cannot be causes of specific
others. Moreover, by incorporating the prior knowledge that causes precedes
their effects in time, we extend the first algorithm to the second method for
causal discovery in time series data. We validate the first proposed method by
using simulated data to demonstrate that the accuracy of causal discovery
increases as more prior knowledge is accumulated. Additionally, we test the
second proposed method by comparing it with existing time series causal
discovery methods, using both simulated data and real-world data.
( 3
min )
In this paper, we propose a probabilistic reduced-dimensional vector
autoregressive (PredVAR) model to extract low-dimensional dynamics from
high-dimensional noisy data. The model utilizes an oblique projection to
partition the measurement space into a subspace that accommodates the
reduced-dimensional dynamics and a complementary static subspace. An optimal
oblique decomposition is derived for the best predictability regarding
prediction error covariance. Building on this, we develop an iterative PredVAR
algorithm using maximum likelihood and the expectation-maximization (EM)
framework. This algorithm alternately updates the estimates of the latent
dynamics and optimal oblique projection, yielding dynamic latent variables with
rank-ordered predictability and an explicit latent VAR model that is consistent
with the outer projection model. The superior performance and efficiency of the
proposed approach are demonstrated using data sets from a synthesized Lorenz
system and an industrial process from Eastman Chemical.
( 2
min )
Optimizing Neural networks is a difficult task which is still not well
understood. On the other hand, fixed representation methods such as kernels and
random features have provable optimization guarantees but inferior performance
due to their inherent inability to learn the representations. In this paper, we
aim at bridging this gap by presenting a novel architecture called RedEx
(Reduced Expander Extractor) that is as expressive as neural networks and can
also be trained in a layer-wise fashion via a convex program with semi-definite
constraints and optimization guarantees. We also show that RedEx provably
surpasses fixed representation methods, in the sense that it can efficiently
learn a family of target functions which fixed representation methods cannot.
( 2
min )
Today, we’re excited to announce the availability of Llama 2 inference and fine-tuning support on AWS Trainium and AWS Inferentia instances in Amazon SageMaker JumpStart. Using AWS Trainium and Inferentia based instances, through SageMaker, can help users lower fine-tuning costs by up to 50%, and lower deployment costs by 4.7x, while lowering per token latency. […]
( 18
min )
Geospatial data is data about specific locations on the earth’s surface. It can represent a geographical area as a whole or it can represent an event associated with a geographical area. Analysis of geospatial data is sought after in a few industries. It involves understanding where the data exists from a spatial perspective and why […]
( 13
min )
An interdisciplinary team of researchers thinks health AI could benefit from some of the aviation industry’s long history of hard-won lessons that have created one of the safest activities today.
( 11
min )
The AI Podcast · DigitalPath’s Ethan Higgins On Using AI to Fight Wildfires – Ep. 211 DigitalPath is igniting change in the Golden State — using computer vision, generative adversarial networks and a network of thousands of cameras to detect signs of fire in real time. In the latest episode of NVIDIA’s AI Podcast, host Read article >
( 6
min )
Traditional relational databases struggle with unstructured data – the text, images, videos, and social media feeds that flood our modern world. But graph databases, with their unique structure, offer a powerful tool for taming this chaos and extracting valuable insights. Here’s how they bring a game-changing perspective to unstructured data analytics: Modeling relationships, not just… Read More »Graph databases: Unveiling the hidden connections in unstructured data
The post Graph databases: Unveiling the hidden connections in unstructured data appeared first on Data Science Central.
( 22
min )
Customer success stories illuminate how hardware accelerators speed necessary infrastructure to support all aspects of an accelerated AI and HPC computing datacenter.
The post Use cases show that on-package accelerators benefit HPC/AI workloads from computation to data movement and security appeared first on Data Science Central.
( 27
min )
OpenAI Whisper is an advanced automatic speech recognition (ASR) model with an MIT license. ASR technology finds utility in transcription services, voice assistants, and enhancing accessibility for individuals with hearing impairments. This state-of-the-art model is trained on a vast and diverse dataset of multilingual and multitask supervised data collected from the web. Its high accuracy […]
( 11
min )
The Global Health Drug Discovery Institute and Microsoft Research are using AI to innovate in life sciences by accelerating the development of new treatments for global infectious diseases like tuberculosis and COVID. Find out how.
The post GHDDI and Microsoft Research use AI technology to achieve significant progress in discovering new drugs to treat global infectious diseases appeared first on Microsoft Research.
( 11
min )
Indigenous languages are under threat. Some 3,000 — three-quarters of the total — could disappear before the end of the century, or one every two weeks, according to UNESCO. As part of a movement to protect such languages, New Zealand’s Te Hiku Media, a broadcaster focused on the Māori people’s indigenous language known as te Read article >
( 7
min )
Curiosity leads the way for this week’s featured In the NVIDIA Studio 3D artist, Brellias.
( 7
min )
We funded 10 teams from around the world to design ideas and tools to collectively govern AI. We summarize the innovations, outline our learnings, and call for researchers and engineers to join us as we continue this work.
( 6
min )
We study semi-supervised sequence generation tasks where labeled data are too
scarce to effectively finetune a model and at the same time few-shot prompting
of a large language model (LLM) has suboptimal performance. This happens when a
task, such as parsing, is expensive to annotate and also unfamiliar to a
pretrained LLM. In this paper, we present a discovery that student models
distilled from an in-context learned LLM can often generalize better than their
teacher on such tasks. Leveraging this finding, we present a new method --
multistage collaborative knowledge distillation from an LLM (MCKD) -- for such
tasks. MCKD first few-shot prompts an LLM to produce pseudolabels for unlabeled
data. At each intermediate knowledge distillation (KD) stage, a new pair of
students is trained on disjoint partitions of the pseudolabeled data. Each
student then produces new and improved pseudolabels for its unseen partition to
be used in the next stage of distillation. We demonstrate the advantage of
multistage cross-partition labeling on several syntactic and semantic parsing
tasks. On CRAFT biomedical parsing, for example, 3-stage MCKD with 50 labeled
examples outperforms the prompted LLM and vanilla KD by 7.5% and 3.7% parsing
F1, respectively, and matches the performance of supervised finetuning with 500
examples.
( 3
min )
This letter proposes a novel relaying framework, semantic-forward (SF), for
cooperative communications towards the sixth-generation (6G) wireless networks.
The SF relay extracts and transmits the semantic features, which reduces
forwarding payload, and also improves the network robustness against intra-link
errors. Based on the theoretical basis for cooperative communications with side
information and the turbo principle, we design a joint source-channel coding
algorithm to iteratively exchange the extrinsic information for enhancing the
decoding gains at the destination. Surprisingly, simulation results indicate
that even in bad channel conditions, SF relaying can still effectively improve
the recovered information quality.
( 2
min )
We develop a novel deep learning approach for pricing European basket options
written on assets that follow jump-diffusion dynamics. The option pricing
problem is formulated as a partial integro-differential equation, which is
approximated via a new implicit-explicit minimizing movement time-stepping
approach, involving approximation by deep, residual-type Artificial Neural
Networks (ANNs) for each time step. The integral operator is discretized via
two different approaches: a) a sparse-grid Gauss--Hermite approximation
following localised coordinate axes arising from singular value decompositions,
and b) an ANN-based high-dimensional special-purpose quadrature rule.
Crucially, the proposed ANN is constructed to ensure the asymptotic behavior of
the solution for large values of the underlyings and also leads to consistent
outputs with respect to a priori known qualitative properties of the solution.
The performance and robustness with respect to the dimension of the methods are
assessed in a series of numerical experiments involving the Merton
jump-diffusion model.
( 2
min )
Pedestrian intention prediction is crucial for autonomous driving. In
particular, knowing if pedestrians are going to cross in front of the
ego-vehicle is core to performing safe and comfortable maneuvers. Creating
accurate and fast models that predict such intentions from sequential images is
challenging. A factor contributing to this is the lack of datasets with diverse
crossing and non-crossing (C/NC) scenarios. We address this scarceness by
introducing a framework, named ARCANE, which allows programmatically generating
synthetic datasets consisting of C/NC video clip samples. As an example, we use
ARCANE to generate a large and diverse dataset named PedSynth. We will show how
PedSynth complements widely used real-world datasets such as JAAD and PIE, so
enabling more accurate models for C/NC prediction. Considering the onboard
deployment of C/NC prediction models, we also propose a deep model named
PedGNN, which is fast and has a very low memory footprint. PedGNN is based on a
GNN-GRU architecture that takes a sequence of pedestrian skeletons as input to
predict crossing intentions.
( 2
min )
In this paper we clarify the crucial difference between a deep neural network
and the Fourier series. For the multiple Fourier series of the periodization of
some radial functions on $\mathbb{R}^d$, Kuratsubo (2010) investigated the
behavior of the spherical partial sum, and discovered the third phenomenon
other than the well-known Gibbs-Wilbraham and Pinsky phenomena. In particular,
the third one exhibits prevention of pointwise convergence. In contrast to it,
we give a specific deep neural network and prove pointwise convergence.
( 2
min )
The traditional role of the network layer is the transfer of packet replicas
from source to destination through intermediate network nodes. We present a
generative network layer that uses Generative AI (GenAI) at intermediate or
edge network nodes and analyze its impact on the required data rates in the
network. We conduct a case study where the GenAI-aided nodes generate images
from prompts that consist of substantially compressed latent representations.
The results from network flow analyses under image quality constraints show
that the generative network layer can achieve an improvement of more than 100%
in terms of the required data rate.
( 2
min )
The estimation of probability density functions is a non trivial task that
over the last years has been tackled with machine learning techniques.
Successful applications can be obtained using models inspired by the Boltzmann
machine (BM) architecture. In this manuscript, the product Jacobi-Theta
Boltzmann machine (pJTBM) is introduced as a restricted version of the
Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection
matrix. We show that score matching, based on the Fisher divergence, can be
used to fit probability densities with the pJTBM more efficiently than with the
original RTBM.
( 2
min )
The Model Parameter Randomisation Test (MPRT) is widely acknowledged in the
eXplainable Artificial Intelligence (XAI) community for its well-motivated
evaluative principle: that the explanation function should be sensitive to
changes in the parameters of the model function. However, recent works have
identified several methodological caveats for the empirical interpretation of
MPRT. To address these caveats, we introduce two adaptations to the original
MPRT -- Smooth MPRT and Efficient MPRT, where the former minimises the impact
that noise has on the evaluation results through sampling and the latter
circumvents the need for biased similarity measurements by re-interpreting the
test through the explanation's rise in complexity, after full parameter
randomisation. Our experimental results demonstrate that these proposed
variants lead to improved metric reliability, thus enabling a more trustworthy
application of XAI methods.
( 2
min )
We present a large-scale empirical study of how choices of configuration
parameters affect performance in knowledge distillation (KD). An example of
such a KD parameter is the measure of distance between the predictions of the
teacher and the student, common choices for which include the mean squared
error (MSE) and the KL-divergence. Although scattered efforts have been made to
understand the differences between such options, the KD literature still lacks
a systematic study on their general effect on student performance. We take an
empirical approach to this question in this paper, seeking to find out the
extent to which such choices influence student performance across 13 datasets
from 4 NLP tasks and 3 student sizes. We quantify the cost of making
sub-optimal choices and identify a single configuration that performs well
across the board.
( 2
min )
In this work, we have proposed an approach for improving the GCN for
predicting ratings in social networks. Our model is expanded from the standard
model with several layers of transformer architecture. The main focus of the
paper is on the encoder architecture for node embedding in the network. Using
the embedding layer from the graph-based convolution layer, the attention
mechanism could rearrange the feature space to get a more efficient embedding
for the downstream task. The experiments showed that our proposed architecture
achieves better performance than GCN on the traditional link prediction task.
( 2
min )
Feature selection in noisy label scenarios remains an understudied topic. We
propose a novel genetic algorithm-based approach, the Noise-Aware
Multi-Objective Feature Selection Genetic Algorithm (NMFS-GA), for selecting
optimal feature subsets in binary classification with noisy labels. NMFS-GA
offers a unified framework for selecting feature subsets that are both accurate
and interpretable. We evaluate NMFS-GA on synthetic datasets with label noise,
a Breast Cancer dataset enriched with noisy features, and a real-world ADNI
dataset for dementia conversion prediction. Our results indicate that NMFS-GA
can effectively select feature subsets that improve the accuracy and
interpretability of binary classifiers in scenarios with noisy labels.
( 2
min )
We propose a Block Majorization Minimization method with Extrapolation (BMMe)
for solving a class of multi-convex optimization problems. The extrapolation
parameters of BMMe are updated using a novel adaptive update rule. By showing
that block majorization minimization can be reformulated as a block mirror
descent method, with the Bregman divergence adaptively updated at each
iteration, we establish subsequential convergence for BMMe. We use this method
to design efficient algorithms to tackle nonnegative matrix factorization
problems with the $\beta$-divergences ($\beta$-NMF) for $\beta\in [1,2]$. These
algorithms, which are multiplicative updates with extrapolation, benefit from
our novel results that offer convergence guarantees. We also empirically
illustrate the significant acceleration of BMMe for $\beta$-NMF through
extensive experiments.
( 2
min )
This paper describes the use of connectionist techniques in phonetic speech
recognition with strong latency constraints. The constraints are imposed by the
task of deriving the lip movements of a synthetic face in real time from the
speech signal, by feeding the phonetic string into an articulatory synthesiser.
Particular attention has been paid to analysing the interaction between the
time evolution model learnt by the multi-layer perceptrons and the transition
model imposed by the Viterbi decoder, in different latency conditions. Two
experiments were conducted in which the time dependencies in the language model
(LM) were controlled by a parameter. The results show a strong interaction
between the three factors involved, namely the neural network topology, the
length of time dependencies in the LM and the decoder latency.
( 2
min )
Speech has long been a barrier to effective communication and connection,
persisting as a challenge in our increasingly interconnected world. This
research paper introduces a transformative solution to this persistent obstacle
an end-to-end speech conversion framework tailored for Hindi-to-English
translation, culminating in the synthesis of English audio. By integrating
cutting-edge technologies such as XLSR Wav2Vec2 for automatic speech
recognition (ASR), mBART for neural machine translation (NMT), and a
Text-to-Speech (TTS) synthesis component, this framework offers a unified and
seamless approach to cross-lingual communication. We delve into the intricate
details of each component, elucidating their individual contributions and
exploring the synergies that enable a fluid transition from spoken Hindi to
synthesized English audio.
( 2
min )
This paper introduces Qrlew, an open source library that can parse SQL
queries into Relations -- an intermediate representation -- that keeps track of
rich data types, value ranges, and row ownership; so that they can easily be
rewritten into differentially-private equivalent and turned back into SQL
queries for execution in a variety of standard data stores.
With Qrlew, a data practitioner can express their data queries in standard
SQL; the data owner can run the rewritten query without any technical
integration and with strong privacy guarantees on the output; and the query
rewriting can be operated by a privacy-expert who must be trusted by the owner,
but may belong to a separate organization.
( 2
min )
To extend the antenna design on printed circuit boards (PCBs) for more
engineers of interest, we propose a simple method that models PCB antennas with
a few basic components. By taking two separate steps to decide their geometric
dimensions and positions, antenna prototypes can be facilitated with no
experience required. Random sampling statistics relate to the quality of
dimensions are used in selecting among dimension candidates. A novel
image-based classifier using a convolutional neural network (CNN) is introduced
to further determine the positions of these fixed-dimension components. Two
examples from wearable products have been chosen to examine the entire
workflow. Their final designs are realistic and their performance metrics are
not inferior to the ones designed by experienced engineers.
( 2
min )
We introduce a probabilistic technique for full-waveform inversion, employing
variational inference and conditional normalizing flows to quantify uncertainty
in migration-velocity models and its impact on imaging. Our approach integrates
generative artificial intelligence with physics-informed common-image gathers,
reducing reliance on accurate initial velocity models. Considered case studies
demonstrate its efficacy producing realizations of migration-velocity models
conditioned by the data. These models are used to quantify amplitude and
positioning effects during subsequent imaging.
( 2
min )
Generative models of macromolecules carry abundant and impactful implications
for industrial and biomedical efforts in protein engineering. However, existing
methods are currently limited to modeling protein structures or sequences,
independently or jointly, without regard to the interactions that commonly
occur between proteins and other macromolecules. In this work, we introduce
MMDiff, a generative model that jointly designs sequences and structures of
nucleic acid and protein complexes, independently or in complex, using joint
SE(3)-discrete diffusion noise. Such a model has important implications for
emerging areas of macromolecular design including structure-based transcription
factor design and design of noncoding RNA sequences. We demonstrate the utility
of MMDiff through a rigorous new design benchmark for macromolecular complex
generation that we introduce in this work. Our results demonstrate that MMDiff
is able to successfully generate micro-RNA and single-stranded DNA molecules
while being modestly capable of joint modeling DNA and RNA molecules in
interaction with multi-chain protein complexes. Source code:
https://github.com/Profluent-Internships/MMDiff.
( 2
min )
De novo drug design is a pivotal issue in pharmacology and a new area of
focus in AI for science research. A central challenge in this field is to
generate molecules with specific properties while also producing a wide range
of diverse candidates. Although advanced technologies such as transformer
models and reinforcement learning have been applied in drug design, their
potential has not been fully realized. Therefore, we propose MolRL-MGPT, a
reinforcement learning algorithm with multiple GPT agents for drug molecular
generation. To promote molecular diversity, we encourage the agents to
collaborate in searching for desirable molecules in diverse directions. Our
algorithm has shown promising results on the GuacaMol benchmark and exhibits
efficacy in designing inhibitors against SARS-CoV-2 protein targets. The codes
are available at: https://github.com/HXYfighter/MolRL-MGPT.
( 2
min )
This paper introduces our system submission for the Cadenza ICASSP 2024 Grand
Challenge, which presents the problem of remixing and enhancing music for
hearing aid users. Our system placed first in the challenge, achieving the best
average Hearing-Aid Audio Quality Index (HAAQI) score on the evaluation data
set. We describe the system, which uses an ensemble of deep learning music
source separators that are fine tuned on the challenge data. We demonstrate the
effectiveness of our system through the challenge results and analyze the
importance of different system aspects through ablation studies.
( 2
min )
Document representation is the core of many NLP tasks on machine
understanding. A general representation learned in an unsupervised manner
reserves generality and can be used for various applications. In practice,
sentiment analysis (SA) has been a challenging task that is regarded to be
deeply semantic-related and is often used to assess general representations.
Existing methods on unsupervised document representation learning can be
separated into two families: sequential ones, which explicitly take the
ordering of words into consideration, and non-sequential ones, which do not
explicitly do so. However, both of them suffer from their own weaknesses. In
this paper, we propose a model that overcomes difficulties encountered by both
families of methods. Experiments show that our model outperforms
state-of-the-art methods on popular SA datasets and a fine-grained aspect-based
SA by a large margin.
( 2
min )
While significant advancements have been made in the field of fair machine
learning, the majority of studies focus on scenarios where the decision model
operates on a static population. In this paper, we study fairness in dynamic
systems where sequential decisions are made. Each decision may shift the
underlying distribution of features or user behavior. We model the dynamic
system through a Markov Decision Process (MDP). By acknowledging that
traditional fairness notions and long-term fairness are distinct requirements
that may not necessarily align with one another, we propose an algorithmic
framework to integrate various fairness considerations with reinforcement
learning using both pre-processing and in-processing approaches. Three case
studies show that our method can strike a balance between traditional fairness
notions, long-term fairness, and utility.
( 2
min )
Realistic synthetic tabular data generation encounters significant challenges
in preserving privacy, especially when dealing with sensitive information in
domains like finance and healthcare. In this paper, we introduce
\textit{Federated Tabular Diffusion} (FedTabDiff) for generating high-fidelity
mixed-type tabular data without centralized access to the original tabular
datasets. Leveraging the strengths of \textit{Denoising Diffusion Probabilistic
Models} (DDPMs), our approach addresses the inherent complexities in tabular
data, such as mixed attribute types and implicit relationships. More
critically, FedTabDiff realizes a decentralized learning scheme that permits
multiple entities to collaboratively train a generative model while respecting
data privacy and locality. We extend DDPMs into the federated setting for
tabular data generation, which includes a synchronous update scheme and
weighted averaging for effective model aggregation. Experimental evaluations on
real-world financial and medical datasets attest to the framework's capability
to produce synthetic data that maintains high fidelity, utility, privacy, and
coverage.
( 2
min )
This paper tackles the challenge of automatically assessing physical
rehabilitation exercises for patients who perform the exercises without
clinician supervision. The objective is to provide a quality score to ensure
correct performance and achieve desired results. To achieve this goal, a new
graph-based model, the Dense Spatio-Temporal Graph Conv-GRU Network with
Transformer, is introduced. This model combines a modified version of STGCN and
transformer architectures for efficient handling of spatio-temporal data. The
key idea is to consider skeleton data respecting its non-linear structure as a
graph and detecting joints playing the main role in each rehabilitation
exercise. Dense connections and GRU mechanisms are used to rapidly process
large 3D skeleton inputs and effectively model temporal dynamics. The
transformer encoder's attention mechanism focuses on relevant parts of the
input sequence, making it useful for evaluating rehabilitation exercises. The
evaluation of our proposed approach on the KIMORE and UI-PRMD datasets
highlighted its potential, surpassing state-of-the-art methods in terms of
accuracy and computational time. This resulted in faster and more accurate
learning and assessment of rehabilitation exercises. Additionally, our model
provides valuable feedback through qualitative illustrations, effectively
highlighting the significance of joints in specific exercises.
( 3
min )
Large Language Models (LLMs) hold transformative potential in aviation,
particularly in reconstructing flight trajectories. This paper investigates
this potential, grounded in the notion that LLMs excel at processing sequential
data and deciphering complex data structures. Utilizing the LLaMA 2 model, a
pre-trained open-source LLM, the study focuses on reconstructing flight
trajectories using Automatic Dependent Surveillance-Broadcast (ADS-B) data with
irregularities inherent in real-world scenarios. The findings demonstrate the
model's proficiency in filtering noise and estimating both linear and curved
flight trajectories. However, the analysis also reveals challenges in managing
longer data sequences, which may be attributed to the token length limitations
of LLM models. The study's insights underscore the promise of LLMs in flight
trajectory reconstruction and open new avenues for their broader application
across the aviation and transportation sectors.
( 2
min )
The estimation of probability density functions is a non trivial task that
over the last years has been tackled with machine learning techniques.
Successful applications can be obtained using models inspired by the Boltzmann
machine (BM) architecture. In this manuscript, the product Jacobi-Theta
Boltzmann machine (pJTBM) is introduced as a restricted version of the
Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection
matrix. We show that score matching, based on the Fisher divergence, can be
used to fit probability densities with the pJTBM more efficiently than with the
original RTBM.
( 2
min )
We’re working to prevent abuse, provide transparency on AI-generated content, and improve access to accurate voting information.
( 3
min )
Generative Artificial Intelligence (AI) is one of the most exciting
developments in Computer Science of the last decade. At the same time,
Reinforcement Learning (RL) has emerged as a very successful paradigm for a
variety of machine learning tasks. In this survey, we discuss the state of the
art, opportunities and open research questions in applying RL to generative AI.
In particular, we will discuss three types of applications, namely, RL as an
alternative way for generation without specified objectives; as a way for
generating outputs while concurrently maximizing an objective function; and,
finally, as a way of embedding desired characteristics, which cannot be easily
captured by means of an objective function, into the generative process. We
conclude the survey with an in-depth discussion of the opportunities and
challenges in this fascinating emerging area.
( 2
min )
In this work, we introduce AutoFragDiff, a fragment-based autoregressive
diffusion model for generating 3D molecular structures conditioned on target
protein structures. We employ geometric vector perceptrons to predict atom
types and spatial coordinates of new molecular fragments conditioned on
molecular scaffolds and protein pockets. Our approach improves the local
geometry of the resulting 3D molecules while maintaining high predicted binding
affinity to protein targets. The model can also perform scaffold extension from
user-provided starting molecular scaffold.
( 2
min )
Auditory spatial attention detection (ASAD) is used to determine the
direction of a listener's attention to a speaker by analyzing her/his
electroencephalographic (EEG) signals. This study aimed to further improve the
performance of ASAD with a short decision window (i.e., <1 s) rather than with
long decision windows in previous studies. An end-to-end temporal attention
network (i.e., TAnet) was introduced in this work. TAnet employs a multi-head
attention (MHA) mechanism, which can more effectively capture the interactions
among time steps in collected EEG signals and efficiently assign corresponding
weights to those EEG time steps. Experiments demonstrated that, compared with
the CNN-based method and recent ASAD methods, TAnet provided improved decoding
performance in the KUL dataset, with decoding accuracies of 92.4% (decision
window 0.1 s), 94.9% (0.25 s), 95.1% (0.3 s), 95.4% (0.4 s), and 95.5% (0.5 s)
with short decision windows (i.e., <1 s). As a new ASAD model with a short
decision window, TAnet can potentially facilitate the design of EEG-controlled
intelligent hearing aids and sound recognition systems.
( 2
min )
While fingerprinting localization is favored for its effectiveness, it is
hindered by high data acquisition costs and the inaccuracy of static
database-based estimates. Addressing these issues, this letter presents an
innovative indoor localization method using a data-efficient meta-learning
algorithm. This approach, grounded in the ``Learning to Learn'' paradigm of
meta-learning, utilizes historical localization tasks to improve adaptability
and learning efficiency in dynamic indoor environments. We introduce a
task-weighted loss to enhance knowledge transfer within this framework. Our
comprehensive experiments confirm the method's robustness and superiority over
current benchmarks, achieving a notable 23.13\% average gain in Mean Euclidean
Distance, particularly effective in scenarios with limited CSI data.
( 2
min )
Idealized first-principles models of chemical plants can be inaccurate. An
alternative is to fit a Machine Learning (ML) model directly to plant sensor
data. We use a structured approach: Each unit within the plant gets represented
by one ML model. After fitting the models to the data, the models are connected
into a flowsheet-like directed graph. We find that for smaller plants, this
approach works well, but for larger plants, the complex dynamics arising from
large and nested cycles in the flowsheet lead to instabilities in the solver
during model initialization. We show that a high accuracy of the single-unit
models is not enough: The gradient can point in unexpected directions, which
prevents the solver from converging to the correct stationary state. To address
this problem, we present a way to fine-tune ML models such that initialization,
even with very simple solvers, becomes robust.
( 3
min )
This research examines the polycentric governance of digital assets in
blockchain-based Decentralized Autonomous Organizations (DAOs). It offers a
theoretical framework and addresses a critical challenge facing decentralized
governance by developing a method to identify sybils, or spurious identities.
The method uses graph deep learning techniques to identify sybil activity in a
DAO governance dataset (snapshot.org). Specifically, a Graph Convolutional
Neural Network (GCNN) learned voting behaviours and a fast k-means vector
clustering algorithm (FAISS) used the high dimensional embeddings to identify
similar nodes in a graph. The results reveal that deep learning can effectively
identify sybils, reducing the voting graph by 2-5%. This research underscores
the importance of sybil resistance in DAOs and offers a novel perspective on
decentralized governance, informing future policy, regulation, and governance
practices.
( 2
min )
Federated learning (FL) emphasizes decentralized training by storing data
locally and sending only model updates, underlining user privacy. Recently, a
line of works on privacy attacks impairs user privacy by extracting sensitive
training text from language models in the context of FL. Yet, these attack
techniques face distinct hurdles: some work chiefly with limited batch sizes
(e.g., batch size of 1), and others are easily detectable. This paper
introduces an innovative approach that is challenging to detect, significantly
enhancing the recovery rate of text in various batch-size settings. Building on
fundamental gradient matching and domain prior knowledge, we enhance the attack
by recovering the input of the Pooler layer of language models, which enables
us to provide additional supervised signals at the feature level. Unlike
gradient data, these signals do not average across sentences and tokens,
thereby offering more nuanced and effective insights. We benchmark our method
using text classification tasks on datasets such as CoLA, SST-2, and Rotten
Tomatoes. Across different batch sizes and models, our approach consistently
outperforms previous state-of-the-art results.
( 2
min )
We consider the problem of sequentially learning to estimate, in the mean
squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by
observing only $m < K$ of its entries in each round. We first establish a
concentration bound for MSE estimation. We then frame the estimation problem
with bandit feedback, and propose a variant of the successive elimination
algorithm. We also derive a minimax lower bound to understand the fundamental
limit on the sample complexity of this problem.
( 2
min )
Diffuse correlation spectroscopy (DCS) is an emerging noninvasive technique
that measures the tissue blood flow, by using near-infrared coherent
point-source illumination to detect spectral changes. While machine learning
has demonstrated significant potential for measuring blood flow index (BFi), an
open question concerning the success of this approach pertains to its
robustness in scenarios involving deviations between datasets with varying
Signal-to-Noise Ratios (SNRs) originating from diverse clinical applications
and various setups. This study proposes a transfer learning approach, aims to
assess the influence of SNRs on the generalization ability of learned features,
and demonstrate the robustness for transfer learning. A synthetic dataset with
varying levels of added noise is utilized to simulate different SNRs. The
proposed network takes a 1x64 autocorrelation curve as input and generates BFi
and the correlation parameter beta. The proposed model demonstrates excellent
performance across different SNRs, exhibiting enhanced fitting accuracy,
particularly for low SNR datasets when compared with other fitting methods.
This highlights its potential for clinical diagnosis and treatment across
various scenarios under different clinical setups.
( 2
min )
This paper explores the application of Shapley Value Regression in dissecting
marketing performance at channel-partner level, complementing channel-level
Marketing Mix Modeling (MMM). Utilizing real-world data from the financial
services industry, we demonstrate the practicality of Shapley Value Regression
in evaluating individual partner contributions. Although structured in-field
testing along with cooperative game theory is most accurate, it can often be
highly complex and expensive to conduct. Shapley Value Regression is thus a
more feasible approach to disentangle the influence of each marketing partner
within a marketing channel. We also propose a simple method to derive adjusted
coefficients of Shapley Value Regression and compares it with alternative
approaches.
( 2
min )
Representation learning frameworks in unlabeled time series have been
proposed for medical signal processing. Despite the numerous excellent
progresses have been made in previous works, we observe the representation
extracted for the time series still does not generalize well. In this paper, we
present a Time series (medical signal) Representation Learning framework via
Spectrogram (TRLS) to get more informative representations. We transform the
input time-domain medical signals into spectrograms and design a time-frequency
encoder named Time Frequency RNN (TFRNN) to capture more robust multi-scale
representations from the augmented spectrograms. Our TRLS takes spectrogram as
input with two types of different data augmentations and maximizes the
similarity between positive ones, which effectively circumvents the problem of
designing negative samples. Our evaluation of four real-world medical signal
datasets focusing on medical signal classification shows that TRLS is superior
to the existing frameworks.
( 2
min )
Electromyograms (EMG)-based hand gesture recognition systems are a promising
technology for human/machine interfaces. However, one of their main limitations
is the long calibration time that is typically required to handle new users.
The paper discusses and analyses the challenge of cross-subject generalization
thanks to an original dataset containing the EMG signals of 14 human subjects
during hand gestures. The experimental results show that, though an accurate
generalization based on pooling multiple subjects is hardly achievable, it is
possible to improve the cross-subject estimation by identifying a robust
low-dimensional subspace for multiple subjects and aligning it to a target
subject. A visualization of the subspace enables us to provide insights for the
improvement of cross-subject generalization with EMG signals.
( 2
min )
Alternative data representations are powerful tools that augment the
performance of downstream models. However, there is an abundance of such
representations within the machine learning toolbox, and the field lacks a
comparative understanding of the suitability of each representation method.
In this paper, we propose artifact detection and classification within EEG
data as a testbed for profiling image-based data representations of time series
data. We then evaluate eleven popular deep learning architectures on each of
six commonly-used representation methods.
We find that, while the choice of representation entails a choice within the
tradeoff between bias and variance, certain representations are practically
more effective in highlighting features which increase the signal-to-noise
ratio of the data. We present our results on EEG data, and open-source our
testing framework to enable future comparative analyses in this vein.
( 2
min )
This paper describes an architecture for predicting the price of
cryptocurrencies for the next seven days using the Adaptive Network Based Fuzzy
Inference System (ANFIS). Historical data of cryptocurrencies and indexes that
are considered are Bitcoin (BTC), Ethereum (ETH), Bitcoin Dominance (BTC.D),
and Ethereum Dominance (ETH.D) in a daily timeframe. The methods used to teach
the data are hybrid and backpropagation algorithms, as well as grid partition,
subtractive clustering, and Fuzzy C-means clustering (FCM) algorithms, which
are used in data clustering. The architectural performance designed in this
paper has been compared with different inputs and neural network models in
terms of statistical evaluation criteria. Finally, the proposed method can
predict the price of digital currencies in a short time.
( 2
min )
Spectral lightcurves consisting of time series single-pixel spectral
measurements of spacecraft are used to infer the spacecraft's attitude and
rotation. Two methods are used. One based on numerical optimisation of a
regularised least squares cost function, and another based on machine learning
with a neural network model. The aim is to work with minimal information, thus
no prior is available on the attitude nor on the inertia tensor. The
theoretical and practical aspects of this task are investigated, and the
methodology is tested on synthetic data. Results are shown based on synthetic
data.
( 2
min )
The field of antibody-based therapeutics has grown significantly in recent
years, with targeted antibodies emerging as a potentially effective approach to
personalized therapies. Such therapies could be particularly beneficial for
complex, highly individual diseases such as cancer. However, progress in this
field is often constrained by the extensive search space of amino acid
sequences that form the foundation of antibody design. In this study, we
introduce a novel reinforcement learning method specifically tailored to
address the unique challenges of this domain. We demonstrate that our method
can learn the design of high-affinity antibodies against multiple targets in
silico, utilizing either online interaction or offline datasets. To the best of
our knowledge, our approach is the first of its kind and outperforms existing
methods on all tested antigens in the Absolut! database.
( 2
min )
The advent of Generative AI has marked a significant milestone in artificial
intelligence, demonstrating remarkable capabilities in generating realistic
images, texts, and data patterns. However, these advancements come with
heightened concerns over data privacy and copyright infringement, primarily due
to the reliance on vast datasets for model training. Traditional approaches
like differential privacy, machine unlearning, and data poisoning only offer
fragmented solutions to these complex issues. Our paper delves into the
multifaceted challenges of privacy and copyright protection within the data
lifecycle. We advocate for integrated approaches that combines technical
innovation with ethical foresight, holistically addressing these concerns by
investigating and devising solutions that are informed by the lifecycle
perspective. This work aims to catalyze a broader discussion and inspire
concerted efforts towards data privacy and copyright integrity in Generative
AI.
( 2
min )
Automated Sleep stage classification using raw single channel EEG is a
critical tool for sleep quality assessment and disorder diagnosis. However,
modelling the complexity and variability inherent in this signal is a
challenging task, limiting their practicality and effectiveness in clinical
settings. To mitigate these challenges, this study presents an end-to-end deep
learning (DL) model which integrates squeeze and excitation blocks within the
residual network to extract features and stacked Bi-LSTM to understand complex
temporal dependencies. A distinctive aspect of this study is the adaptation of
GradCam for sleep staging, marking the first instance of an explainable DL
model in this domain with alignment of its decision-making with sleep expert's
insights. We evaluated our model on the publically available datasets
(SleepEDF-20, SleepEDF-78, and SHHS), achieving Macro-F1 scores of 82.5, 78.9,
and 81.9, respectively. Additionally, a novel training efficiency enhancement
strategy was implemented by increasing stride size, leading to 8x faster
training times with minimal impact on performance. Comparative analyses
underscore our model outperforms all existing baselines, indicating its
potential for clinical usage.
( 3
min )
Photoplethysmography (PPG) refers to the measurement of variations in blood
volume using light and is a feature of most wearable devices. The PPG signals
provide insight into the body's circulatory system and can be employed to
extract various bio-features, such as heart rate and vascular ageing. Although
several algorithms have been proposed for this purpose, many exhibit
limitations, including heavy reliance on human calibration, high signal quality
requirements, and a lack of generalisation. In this paper, we introduce a PPG
signal processing framework that integrates graph theory and computer vision
algorithms, to provide an analysis framework which is amplitude-independent and
invariant to affine transformations. It also requires minimal preprocessing,
fuses information through RGB channels and exhibits robust generalisation
across tasks and datasets. The proposed VGTL-net achieves state-of-the-art
performance in the prediction of vascular ageing and demonstrates robust
estimation of continuous blood pressure waveforms.
( 2
min )
For efficient neural network inference, it is desirable to achieve
state-of-the-art accuracy with the simplest networks requiring the least
computation, memory, and power. Quantizing networks to lower precision is a
powerful technique for simplifying networks. As each layer of a network may
have different sensitivity to quantization, mixed precision quantization
methods selectively tune the precision of individual layers to achieve a
minimum drop in task performance (e.g., accuracy). To estimate the impact of
layer precision choice on task performance, two methods are introduced: i)
Entropy Approximation Guided Layer selection (EAGL) is fast and uses the
entropy of the weight distribution, and ii) Accuracy-aware Layer Precision
Selection (ALPS) is straightforward and relies on single epoch fine-tuning
after layer precision reduction. Using EAGL and ALPS for layer precision
selection, full-precision accuracy is recovered with a mix of 4-bit and 2-bit
layers for ResNet-50, ResNet-101 and BERT-base transformer networks,
demonstrating enhanced performance across the entire accuracy-throughput
frontier. The techniques demonstrate better performance than existing
techniques in several commensurate comparisons. Notably, this is accomplished
with significantly lesser computational time required to reach a solution.
( 2
min )
The unfolding of detector effects is crucial for the comparison of data to
theory predictions. While traditional methods are limited to representing the
data in a low number of dimensions, machine learning has enabled new unfolding
techniques while retaining the full dimensionality. Generative networks like
invertible neural networks~(INN) enable a probabilistic unfolding, which map
individual events to their corresponding unfolded probability distribution. The
accuracy of such methods is however limited by how well simulated training
samples model the actual data that is unfolded. We introduce the iterative
conditional INN~(IcINN) for unfolding that adjusts for deviations between
simulated training samples and data. The IcINN unfolding is first validated on
toy data and then applied to pseudo-data for the $pp \to Z \gamma \gamma$
process.
( 2
min )
The paper shows that Physics-Informed Neural Networks (PINNs) can fail to
estimate the correct Partial Differential Equations (PDEs) dynamics in cases of
unknown changepoints in the parameters. To address this, we propose a new
CP-PINNs model which integrates PINNs with Total-Variation penalty for accurate
changepoints detection and PDEs discovery. In order to optimally combine the
tasks of model fitting, PDEs discovery, and changepoints detection, we develop
a new meta-learning algorithm that exploits batch learning to dynamically
refines the optimization objective when moving over the consecutive batches of
the data. Empirically, in case of changepoints in the dynamics, our approach
demonstrates accurate parameter estimation and model alignment, and in case of
no changepoints in the data, it converges numerically to the solution from the
original PINNs model.
( 2
min )
Features in images' backgrounds can spuriously correlate with the images'
classes, representing background bias. They can influence the classifier's
decisions, causing shortcut learning (Clever Hans effect). The phenomenon
generates deep neural networks (DNNs) that perform well on standard evaluation
datasets but generalize poorly to real-world data. Layer-wise Relevance
Propagation (LRP) explains DNNs' decisions. Here, we show that the optimization
of LRP heatmaps can minimize the background bias influence on deep classifiers,
hindering shortcut learning. By not increasing run-time computational cost, the
approach is light and fast. Furthermore, it applies to virtually any
classification architecture. After injecting synthetic bias in images'
backgrounds, we compared our approach (dubbed ISNet) to eight state-of-the-art
DNNs, quantitatively demonstrating its superior robustness to background bias.
Mixed datasets are common for COVID-19 and tuberculosis classification with
chest X-rays, fostering background bias. By focusing on the lungs, the ISNet
reduced shortcut learning. Thus, its generalization performance on external
(out-of-distribution) test databases significantly surpassed all implemented
benchmark models.
( 3
min )
Resistive memory is a promising alternative to SRAM, but is also an
inherently unstable device that requires substantial effort to ensure correct
read and write operations. To avoid the associated costs in terms of area, time
and energy, the present work is concerned with exploring how much noise in
memory operations can be tolerated by image classification tasks based on
neural networks. We introduce a special noisy operator that mimics the noise in
an exemplary resistive memory unit, explore the resilience of convolutional
neural networks on the CIFAR-10 classification task, and discuss a couple of
countermeasures to improve this resilience.
( 2
min )
Machine learning has emerged as a powerful solution to the modern challenges
in accelerator physics. However, the limited availability of beam time, the
computational cost of simulations, and the high-dimensionality of optimisation
problems pose significant challenges in generating the required data for
training state-of-the-art machine learning models. In this work, we introduce
Cheetah, a PyTorch-based high-speed differentiable linear-beam dynamics code.
Cheetah enables the fast collection of large data sets by reducing computation
times by multiple orders of magnitude and facilitates efficient gradient-based
optimisation for accelerator tuning and system identification. This positions
Cheetah as a user-friendly, readily extensible tool that integrates seamlessly
with widely adopted machine learning tools. We showcase the utility of Cheetah
through five examples, including reinforcement learning training,
gradient-based beamline tuning, gradient-based system identification,
physics-informed Bayesian optimisation priors, and modular neural network
surrogate modelling of space charge effects. The use of such a high-speed
differentiable simulation code will simplify the development of machine
learning-based methods for particle accelerators and fast-track their
integration into everyday operations of accelerator facilities.
( 2
min )
Heating, Ventilation, and Air Conditioning (HVAC) systems are a major driver
of energy consumption in commercial and residential buildings. Recent studies
have shown that Deep Reinforcement Learning (DRL) algorithms can outperform
traditional reactive controllers. However, DRL-based solutions are generally
designed for ad hoc setups and lack standardization for comparison. To fill
this gap, this paper provides a critical and reproducible evaluation, in terms
of comfort and energy consumption, of several state-of-the-art DRL algorithms
for HVAC control. The study examines the controllers' robustness, adaptability,
and trade-off between optimization goals by using the Sinergym framework. The
results obtained confirm the potential of DRL algorithms, such as SAC and TD3,
in complex scenarios and reveal several challenges related to generalization
and incremental learning.
( 2
min )
This article investigates the possibility to use the class entropy of the
output of a connectionist phoneme recogniser to predict time boundaries between
phonetic classes. The rationale is that the value of the entropy should
increase in proximity of a transition between two segments that are well
modelled (known) by the recognition network since it is a measure of
uncertainty. The advantage of this measure is its simplicity as the posterior
probabilities of each class are available in connectionist phoneme recognition.
The entropy and a number of measures based on differentiation of the entropy
are used in isolation and in combination. The decision methods for predicting
the boundaries range from simple thresholds to neural network based procedure.
The different methods are compared with respect to their precision, measured in
terms of the ratio between the number C of predicted boundaries within 10 or 20
msec of the reference and the total number of predicted boundaries, and recall,
measured as the ratio between C and the total number of reference boundaries.
( 2
min )
In this paper, we study the problem of estimating the normalizing constant
$\int e^{-\lambda f(x)}dx$ through queries to the black-box function $f$, where
$f$ belongs to a reproducing kernel Hilbert space (RKHS), and $\lambda$ is a
problem parameter. We show that to estimate the normalizing constant within a
small relative error, the level of difficulty depends on the value of
$\lambda$: When $\lambda$ approaches zero, the problem is similar to Bayesian
quadrature (BQ), while when $\lambda$ approaches infinity, the problem is
similar to Bayesian optimization (BO). More generally, the problem varies
between BQ and BO. We find that this pattern holds true even when the function
evaluations are noisy, bringing new aspects to this topic. Our findings are
supported by both algorithm-independent lower bounds and algorithmic upper
bounds, as well as simulation studies conducted on a variety of benchmark
functions.
( 2
min )
We study reinforcement learning in the presence of an unknown reward
perturbation. Existing methodologies for this problem make strong assumptions
including reward smoothness, known perturbations, and/or perturbations that do
not modify the optimal policy. We study the case of unknown arbitrary
perturbations that discretize and shuffle reward space, but have the property
that the true reward belongs to the most frequently observed class after
perturbation. This class of perturbations generalizes existing classes (and, in
the limit, all continuous bounded perturbations) and defeats existing methods.
We introduce an adaptive distributional reward critic and show theoretically
that it can recover the true rewards under technical conditions. Under the
targeted perturbation in discrete and continuous control tasks, we win/tie the
highest return in 40/57 settings (compared to 16/57 for the best baseline).
Even under the untargeted perturbation, we still win an edge over the baseline
designed especially for that setting.
( 2
min )
Control Barrier Functions (CBFs) provide an elegant framework for designing
safety filters for nonlinear control systems by constraining their trajectories
to an invariant subset of a prespecified safe set. However, the task of finding
a CBF that concurrently maximizes the volume of the resulting control invariant
set while accommodating complex safety constraints, particularly in high
relative degree systems with actuation constraints, continues to pose a
substantial challenge. In this work, we propose a novel self-supervised
learning framework that holistically addresses these hurdles. Given a Boolean
composition of multiple state constraints that define the safe set, our
approach starts with building a single continuously differentiable function
whose 0-superlevel set provides an inner approximation of the safe set. We then
use this function together with a smooth neural network to parameterize the CBF
candidate. Finally, we design a training loss function based on a
Hamilton-Jacobi partial differential equation to train the CBF while enlarging
the volume of the induced control invariant set. We demonstrate the
effectiveness of our approach via numerical experiments.
( 2
min )
Transfer learning (TL) is an increasingly popular approach to training deep
learning (DL) models that leverages the knowledge gained by training a
foundation model on diverse, large-scale datasets for use on downstream tasks
where less domain- or task-specific data is available. The literature is rich
with TL techniques and applications; however, the bulk of the research makes
use of deterministic DL models which are often uncalibrated and lack the
ability to communicate a measure of epistemic (model) uncertainty in
prediction. Unlike their deterministic counterparts, Bayesian DL (BDL) models
are often well-calibrated, provide access to epistemic uncertainty for a
prediction, and are capable of achieving competitive predictive performance. In
this study, we propose variational inference pre-trained audio neural networks
(VI-PANNs). VI-PANNs are a variational inference variant of the popular
ResNet-54 architecture which are pre-trained on AudioSet, a large-scale audio
event detection dataset. We evaluate the quality of the resulting uncertainty
when transferring knowledge from VI-PANNs to other downstream acoustic
classification tasks using the ESC-50, UrbanSound8K, and DCASE2013 datasets. We
demonstrate, for the first time, that it is possible to transfer calibrated
uncertainty information along with knowledge from upstream tasks to enhance a
model's capability to perform downstream tasks.
( 2
min )
The problem of data clustering is one of the most important in data analysis.
It can be problematic when dealing with experimental data characterized by
measurement uncertainties and errors. Our paper proposes a recursive scheme for
clustering data obtained in geographical (climatological) experiments. The
discussion of results obtained by k-means and SOM methods with the developed
recursive procedure is presented. We show that the clustering using the new
approach gives more acceptable results when compared to experts assessments.
( 2
min )
Graph neural networks (GNN) are a powerful tool for combining imaging and
non-imaging medical information for node classification tasks. Cross-network
node classification extends GNN techniques to account for domain drift,
allowing for node classification on an unlabeled target network. In this paper
we present OTGCN, a powerful, novel approach to cross-network node
classification. This approach leans on concepts from graph convolutional
networks to harness insights from graph data structures while simultaneously
applying strategies rooted in optimal transport to correct for the domain drift
that can occur between samples from different data collection sites. This
blended approach provides a practical solution for scenarios with many distinct
forms of data collected across different locations and equipment. We
demonstrate the effectiveness of this approach at classifying Autism Spectrum
Disorder subjects using a blend of imaging and non-imaging data.
( 2
min )
This paper introduces a new problem in the field of graph mining and social
network analysis called new node prediction. More technically, the task can be
categorized as zero-shot out-of-graph all-links prediction. This challenging
problem aims to predict all links from a new, isolated, and unobserved node
that was previously disconnected from the graph. Unlike classic approaches to
link prediction (including few-shot out-of-graph link prediction), this problem
presents two key differences: (1) the new node has no existing links from which
to extract patterns for new predictions; and (2) the goal is to predict not
just one, but all the links of this new node, or at least a significant part of
them. Experiments demonstrate that an architecture based on Deep Graph Neural
Networks can learn to solve this challenging problem in a bibliographic
citation network.
( 2
min )
We present a nonparametric method for outlier detection that takes full
account of local variations in intrinsic dimensionality within the dataset.
Using the theory of Local Intrinsic Dimensionality (LID), our
'dimensionality-aware' outlier detection method, DAO, is derived as an
estimator of an asymptotic local expected density ratio involving the query
point and a close neighbor drawn at random. The dimensionality-aware behavior
of DAO is due to its use of local estimation of LID values in a
theoretically-justified way. Through comprehensive experimentation on more than
800 synthetic and real datasets, we show that DAO significantly outperforms
three popular and important benchmark outlier detection methods: Local Outlier
Factor (LOF), Simplified LOF, and kNN.
( 2
min )
We propose two novel purpose-built deep learning (DL) models for synthesis of
the arterial blood pressure (ABP) waveform in a cuff-less manner, using a
single-site photoplethysmography (PPG) signal. We utilize the public UCI
dataset on cuff-less blood pressure (CLBP) estimation to train and evaluate our
DL models. Firstly, we implement a transformer model that incorporates
positional encoding, multi-head attention, layer normalization, and dropout
techniques, and synthesizes the ABP waveform with a mean absolute error (MAE)
of 14. Secondly, we implement a frequency-domain (FD) learning approach where
we first obtain the discrete cosine transform (DCT) coefficients of the PPG and
ABP signals corresponding to two cardiac cycles, and then learn a
linear/non-linear (L/NL) regression between them. We learn that the FD L/NL
regression model outperforms the transformer model by achieving an MAE of 11.87
and 8.01, for diastolic blood pressure (DBP) and systolic blood pressure (SBP),
respectively. Our FD L/NL regression model also fulfills the AAMI criterion of
utilizing data from more than 85 subjects, and achieves grade B by the BHS
criterion.
( 2
min )
The advent of compact, handheld devices has given us a pool of tracked
movement data that could be used to infer trends and patterns that can be made
to use. With this flooding of various trajectory data of animals, humans,
vehicles, etc., the idea of ANALYTiC originated, using active learning to infer
semantic annotations from the trajectories by learning from sets of labeled
data. This study explores the application of dimensionality reduction and
decision boundaries in combination with the already present active learning,
highlighting patterns and clusters in data. We test these features with three
different trajectory datasets with objective of exploiting the the already
labeled data and enhance their interpretability. Our experimental analysis
exemplifies the potential of these combined methodologies in improving the
efficiency and accuracy of trajectory labeling. This study serves as a
stepping-stone towards the broader integration of machine learning and visual
methods in context of movement data analysis.
( 2
min )
Financial data is generally time series in essence and thus suffers from
three fundamental issues: the mismatch in time resolution, the time-varying
property of the distribution - nonstationarity, and causal factors that are
important but unknown/unobserved. In this paper, we follow a causal perspective
to systematically look into these three demons in finance. Specifically, we
reexamine these issues in the context of causality, which gives rise to a novel
and inspiring understanding of how the issues can be addressed. Following this
perspective, we provide systematic solutions to these problems, which hopefully
would serve as a foundation for future research in the area.
( 2
min )
In this work, we propose a denoising diffusion generative model (DDGM)
trained with healthy electrocardiogram (ECG) data that focuses on ECG
morphology and inter-lead dependence. Our results show that this innovative
generative model can successfully generate realistic ECG signals. Furthermore,
we explore the application of recent breakthroughs in solving linear inverse
Bayesian problems using DDGM. This approach enables the development of several
important clinical tools. These include the calculation of corrected QT
intervals (QTc), effective noise suppression of ECG signals, recovery of
missing ECG leads, and identification of anomalous readings, enabling
significant advances in cardiac health monitoring and diagnosis.
( 2
min )
In automotive applications, frequency modulated continuous wave (FMCW) radar
is an established technology to determine the distance, velocity and angle of
objects in the vicinity of the vehicle. The quality of predictions might be
seriously impaired if mutual interference between radar sensors occurs.
Previous work processes data from the entire receiver array in parallel to
increase interference mitigation quality using neural networks (NNs). However,
these architectures do not generalize well across different angles of arrival
(AoAs) of interferences and objects. In this paper we introduce fully
convolutional neural network (CNN) with rank-three convolutions which is able
to transfer learned patterns between different AoAs. Our proposed architecture
outperforms previous work while having higher robustness and a lower number of
trainable parameters. We evaluate our network on a diverse data set and
demonstrate its angle equivariance.
( 2
min )
Accurate RNA secondary structure prediction is vital for understanding
cellular regulation and disease mechanisms. Deep learning (DL) methods have
surpassed traditional algorithms by predicting complex features like
pseudoknots and multi-interacting base pairs. However, traditional distance
measures can hardly deal with such tertiary interactions and the currently used
evaluation measures (F1 score, MCC) have limitations. We propose the
Weisfeiler-Lehman graph kernel (WL) as an alternative metric. Embracing
graph-based metrics like WL enables fair and accurate evaluation of RNA
structure prediction algorithms. Further, WL provides informative guidance, as
demonstrated in an RNA design experiment.
( 2
min )
Robustness certification, which aims to formally certify the predictions of
neural networks against adversarial inputs, has become an integral part of
important tool for safety-critical applications. Despite considerable progress,
existing certification methods are limited to elementary architectures, such as
convolutional networks, recurrent networks and recently Transformers, on
benchmark datasets such as MNIST. In this paper, we focus on the robustness
certification of scene text recognition (STR), which is a complex and
extensively deployed image-based sequence prediction problem. We tackle three
types of STR model architectures, including the standard STR pipelines and the
Vision Transformer. We propose STR-Cert, the first certification method for STR
models, by significantly extending the DeepPoly polyhedral verification
framework via deriving novel polyhedral bounds and algorithms for key STR model
components. Finally, we certify and compare STR models on six datasets,
demonstrating the efficiency and scalability of robustness certification,
particularly for the Vision Transformer.
( 2
min )
This study presents an unsupervised machine learning approach for optimizing
Profit and Loss (PnL) in quantitative finance. Our algorithm, akin to an
unsupervised variant of linear regression, maximizes the Sharpe Ratio of PnL
generated from signals constructed linearly from exogenous variables. The
methodology employs a linear relationship between exogenous variables and the
trading signal, with the objective of maximizing the Sharpe Ratio through
parameter optimization. Empirical application on an ETF representing U.S.
Treasury bonds demonstrates the model's effectiveness, supported by
regularization techniques to mitigate overfitting. The study concludes with
potential avenues for further development, including generalized time steps and
enhanced corrective terms.
( 2
min )
The fairness of Natural Language Processing (NLP) models has emerged as a
crucial concern. Information theory indicates that to achieve fairness, a model
should not be able to predict sensitive variables, such as gender, ethnicity,
and age. However, information related to these variables often appears
implicitly in language, posing a challenge in identifying and mitigating biases
effectively. To tackle this issue, we present a novel approach that operates at
the embedding level of an NLP model, independent of the specific architecture.
Our method leverages insights from recent advances in XAI techniques and
employs an embedding transformation to eliminate implicit information from a
selected variable. By directly manipulating the embeddings in the final layer,
our approach enables a seamless integration into existing models without
requiring significant modifications or retraining. In evaluation, we show that
the proposed post-hoc approach significantly reduces gender-related
associations in NLP models while preserving the overall performance and
functionality of the models. An implementation of our method is available:
https://github.com/fanny-jourdan/TaCo
( 2
min )
The paper shows that Physics-Informed Neural Networks (PINNs) can fail to
estimate the correct Partial Differential Equations (PDEs) dynamics in cases of
unknown changepoints in the parameters. To address this, we propose a new
CP-PINNs model which integrates PINNs with Total-Variation penalty for accurate
changepoints detection and PDEs discovery. In order to optimally combine the
tasks of model fitting, PDEs discovery, and changepoints detection, we develop
a new meta-learning algorithm that exploits batch learning to dynamically
refines the optimization objective when moving over the consecutive batches of
the data. Empirically, in case of changepoints in the dynamics, our approach
demonstrates accurate parameter estimation and model alignment, and in case of
no changepoints in the data, it converges numerically to the solution from the
original PINNs model.
( 2
min )
In this paper, we study the problem of estimating the normalizing constant
$\int e^{-\lambda f(x)}dx$ through queries to the black-box function $f$, where
$f$ belongs to a reproducing kernel Hilbert space (RKHS), and $\lambda$ is a
problem parameter. We show that to estimate the normalizing constant within a
small relative error, the level of difficulty depends on the value of
$\lambda$: When $\lambda$ approaches zero, the problem is similar to Bayesian
quadrature (BQ), while when $\lambda$ approaches infinity, the problem is
similar to Bayesian optimization (BO). More generally, the problem varies
between BQ and BO. We find that this pattern holds true even when the function
evaluations are noisy, bringing new aspects to this topic. Our findings are
supported by both algorithm-independent lower bounds and algorithmic upper
bounds, as well as simulation studies conducted on a variety of benchmark
functions.
( 2
min )
In this work, we propose a denoising diffusion generative model (DDGM)
trained with healthy electrocardiogram (ECG) data that focuses on ECG
morphology and inter-lead dependence. Our results show that this innovative
generative model can successfully generate realistic ECG signals. Furthermore,
we explore the application of recent breakthroughs in solving linear inverse
Bayesian problems using DDGM. This approach enables the development of several
important clinical tools. These include the calculation of corrected QT
intervals (QTc), effective noise suppression of ECG signals, recovery of
missing ECG leads, and identification of anomalous readings, enabling
significant advances in cardiac health monitoring and diagnosis.
( 2
min )
“This year, every industry will become a technology industry,” NVIDIA founder and CEO Jensen Huang told attendees Wednesday during the annual J.P. Morgan Healthcare Conference. “You can now recognize and learn the language of almost anything with structure, and you can translate it to anything with structure — so text-protein, protein-text,” Huang said in a Read article >
( 6
min )
Enterprises have access to massive amounts of data, much of which is difficult to discover because the data is unstructured. Conventional approaches to analyzing unstructured data use keyword or synonym matching. They don’t capture the full context of a document, making them less effective in dealing with unstructured data. In contrast, text embeddings use machine […]
( 12
min )
The ability to accurately approximate trajectories of dynamical systems
enables their analysis, prediction, and control. Neural network (NN)-based
approximations have attracted significant interest due to fast evaluation with
good accuracy over long integration time steps. In contrast to established
numerical approximation schemes such as Runge-Kutta methods, the estimation of
the error of the NN-based approximations proves to be difficult. In this work,
we propose to use the NN's predictions in a high-order implicit Runge-Kutta
(IRK) method. The residuals in the implicit system of equations can be related
to the NN's prediction error, hence, we can provide an error estimate at
several points along a trajectory. We find that this error estimate highly
correlates with the NN's prediction error and that increasing the order of the
IRK method improves this estimate. We demonstrate this estimation methodology
for Physics-Informed Neural Network (PINNs) on the logistic equation as an
illustrative example and then apply it to a four-state electric generator model
that is regularly used in power system modelling.
( 2
min )
Bayesian Neural Networks (BayNNs) can inherently estimate predictive
uncertainty, facilitating informed decision-making. Dropout-based BayNNs are
increasingly implemented in spintronics-based computation-in-memory
architectures for resource-constrained yet high-performance safety-critical
applications. Although uncertainty estimation is important, the reliability of
Dropout generation and BayNN computation is equally important for target
applications but is overlooked in existing works. However, testing BayNNs is
significantly more challenging compared to conventional NNs, due to their
stochastic nature. In this paper, we present for the first time the model of
the non-idealities of the spintronics-based Dropout module and analyze their
impact on uncertainty estimates and accuracy. Furthermore, we propose a testing
framework based on repeatability ranking for Dropout-based BayNN with up to
$100\%$ fault coverage while using only $0.2\%$ of training data as test
vectors.
( 2
min )
The reliable diagnosis of cardiac conditions through electrocardiogram (ECG)
analysis critically depends on accurately detecting P waves and measuring the
PR interval. However, achieving consistent and generalizable diagnoses across
diverse populations presents challenges due to the inherent global variations
observed in ECG signals. This paper is focused on applying the Q learning
reinforcement algorithm to the various ECG datasets available in the
PhysioNet/Computing in Cardiology Challenge (CinC). Five ECG beats, including
Normal Sinus Rhythm, Atrial Flutter, Atrial Fibrillation, 1st Degree
Atrioventricular Block, and Left Atrial Enlargement, are included to study
variations of P waves and PR Interval on Lead II and Lead V1. Q-Agent
classified 71,672 beat samples in 8,867 patients with an average accuracy of
90.4% and only 9.6% average hamming loss over misclassification. The average
classification time at the 100th episode containing around 40,000 samples is
0.04 seconds. An average training reward of 344.05 is achieved at an alpha,
gamma, and SoftMax temperature rate of 0.001, 0.9, and 0.1, respectively.
( 2
min )
We propose a novel high-performance, interpretable, and parameter \&
computationally efficient deep learning architecture for tabular data, Gated
Adaptive Network for Deep Automated Learning of Features (GANDALF). GANDALF
relies on a new tabular processing unit with a gating mechanism and in-built
feature selection called Gated Feature Learning Unit (GFLU) as a feature
representation learning unit. We demonstrate that GANDALF outperforms or stays
at-par with SOTA approaches like XGBoost, SAINT, FT-Transformers, etc. by
experiments on multiple established public benchmarks. We have made available
the code at github.com/manujosephv/pytorch_tabular under MIT License.
( 2
min )
In this paper, we consider a wireless network of smart sensors (agents) that
monitor a dynamical process and send measurements to a base station that
performs global monitoring and decision-making. Smart sensors are equipped with
both sensing and computation, and can either send raw measurements or process
them prior to transmission. Constrained agent resources raise a fundamental
latency-accuracy trade-off. On the one hand, raw measurements are inaccurate
but fast to produce. On the other hand, data processing on resource-constrained
platforms generates accurate measurements at the cost of non-negligible
computation latency. Further, if processed data are also compressed, latency
caused by wireless communication might be higher for raw measurements. Hence,
it is challenging to decide when and where sensors in the network should
transmit raw measurements or leverage time-consuming local processing. To
tackle this design problem, we propose a Reinforcement Learning approach to
learn an efficient policy that dynamically decides when measurements are to be
processed at each sensor. Effectiveness of our proposed approach is validated
through a numerical simulation with case study on smart sensing motivated by
the Internet of Drones.
( 3
min )
Models trained with empirical risk minimization (ERM) are known to learn to
rely on spurious features, i.e., their prediction is based on undesired
auxiliary features which are strongly correlated with class labels but lack
causal reasoning. This behavior particularly degrades accuracy in groups of
samples of the correlated class that are missing the spurious feature or
samples of the opposite class but with the spurious feature present. The
recently proposed Deep Feature Reweighting (DFR) method improves accuracy of
these worst groups. Based on the main argument that ERM mods can learn core
features sufficiently well, DFR only needs to retrain the last layer of the
classification model with a small group-balanced data set. In this work, we
examine the applicability of DFR to realistic data in the medical domain.
Furthermore, we investigate the reasoning behind the effectiveness of
last-layer retraining and show that even though DFR has the potential to
improve the accuracy of the worst group, it remains susceptible to spurious
correlations.
( 2
min )
We propose a new training algorithm, named DualFL (Dualized Federated
Learning), for solving distributed optimization problems in federated learning.
DualFL achieves communication acceleration for very general convex cost
functions, thereby providing a solution to an open theoretical problem in
federated learning concerning cost functions that may not be smooth nor
strongly convex. We provide a detailed analysis for the local iteration
complexity of DualFL to ensure the overall computational efficiency of DualFL.
Furthermore, we introduce a completely new approach for the convergence
analysis of federated learning based on a dual formulation. This new technique
enables concise and elegant analysis, which contrasts the complex calculations
used in existing literature on convergence of federated learning algorithms.
( 2
min )
Effective disaster response is critical for affected communities. Responders
and decision-makers would benefit from reliable, timely measures of the issues
impacting their communities during a disaster, and social media offers a
potentially rich data source. Social media can reflect public concerns and
demands during a disaster, offering valuable insights for decision-makers to
understand evolving situations and optimize resource allocation. We used
Bidirectional Encoder Representations from Transformers (BERT) topic modeling
to cluster topics from Twitter data. Then, we conducted a temporal-spatial
analysis to examine the distribution of these topics across different regions
during the 2020 western U.S. wildfire season. Our results show that Twitter
users mainly focused on three topics:"health impact," "damage," and
"evacuation." We used the Susceptible-Infected-Recovered (SIR) theory to
explore the magnitude and velocity of topic diffusion on Twitter. The results
displayed a clear relationship between topic trends and wildfire propagation
patterns. The estimated parameters obtained from the SIR model in selected
cities revealed that residents exhibited a high level of several concerns
during the wildfire. Our study details how the SIR model and topic modeling
using social media data can provide decision-makers with a quantitative
approach to measure disaster response and support their decision-making
processes.
( 3
min )
Monotone missing data is a common problem in data analysis. However,
imputation combined with dimensionality reduction can be computationally
expensive, especially with the increasing size of datasets. To address this
issue, we propose a Blockwise principal component analysis Imputation (BPI)
framework for dimensionality reduction and imputation of monotone missing data.
The framework conducts Principal Component Analysis (PCA) on the observed part
of each monotone block of the data and then imputes on merging the obtained
principal components using a chosen imputation technique. BPI can work with
various imputation techniques and can significantly reduce imputation time
compared to conducting dimensionality reduction after imputation. This makes it
a practical and efficient approach for large datasets with monotone missing
data. Our experiments validate the improvement in speed. In addition, our
experiments also show that while applying MICE imputation directly on missing
data may not yield convergence, applying BPI with MICE for the data may lead to
convergence.
( 2
min )
In continual learning (CL), an AI agent (e.g., autonomous vehicles or
robotics) learns from non-stationary data streams under dynamic environments.
For the practical deployment of such applications, it is important to guarantee
robustness to unseen environments while maintaining past experiences. In this
paper, a novel CL framework is proposed to achieve robust generalization to
dynamic environments while retaining past knowledge. The considered CL agent
uses a capacity-limited memory to save previously observed environmental
information to mitigate forgetting issues. Then, data points are sampled from
the memory to estimate the distribution of risks over environmental change so
as to obtain predictors that are robust with unseen changes. The generalization
and memorization performance of the proposed framework are theoretically
analyzed. This analysis showcases the tradeoff between memorization and
generalization with the memory size. Experiments show that the proposed
algorithm outperforms memory-based CL baselines across all environments while
significantly improving the generalization performance on unseen target
environments.
( 2
min )
In this paper, we design a real-time question-answering system specifically
targeted for helping sellers get relevant material/documentation they can share
live with their customers or refer to during a call. Taking the Seismic content
repository as a relatively large scale example of a diverse dataset of sales
material, we demonstrate how LLM embeddings of sellers' queries can be matched
with the relevant content. We achieve this by engineering prompts in an
elaborate fashion that makes use of the rich set of meta-features available for
documents and sellers. Using a bi-encoder with cross-encoder re-ranker
architecture, we show how the solution returns the most relevant content
recommendations in just a few seconds even for large datasets. Our recommender
system is deployed as an AML endpoint for real-time inferencing and has been
integrated into a Copilot interface that is now deployed in the production
version of the Dynamics CRM, known as MSX, used daily by Microsoft sellers.
( 2
min )
Mobile autonomy relies on the precise perception of dynamic environments.
Robustly tracking moving objects in 3D world thus plays a pivotal role for
applications like trajectory prediction, obstacle avoidance, and path planning.
While most current methods utilize LiDARs or cameras for Multiple Object
Tracking (MOT), the capabilities of 4D imaging radars remain largely
unexplored. Recognizing the challenges posed by radar noise and point sparsity
in 4D radar data, we introduce RaTrack, an innovative solution tailored for
radar-based tracking. Bypassing the typical reliance on specific object types
and 3D bounding boxes, our method focuses on motion segmentation and
clustering, enriched by a motion estimation module. Evaluated on the
View-of-Delft dataset, RaTrack showcases superior tracking precision of moving
objects, largely surpassing the performance of the state of the art.
( 2
min )
To investigate the processing of speech in the brain, simple linear models
are commonly used to establish a relationship between brain signals and speech
features. However, these linear models are ill-equipped to model a highly
dynamic and complex non-linear system like the brain. Although non-linear
methods with neural networks have been developed recently, reconstructing
unseen stimuli from unseen subjects' EEG is still a highly challenging task.
This work presents a novel method, ConvConcatNet, to reconstruct mel-specgrams
from EEG, in which the deep convolution neural network and extensive
concatenation operation were combined. With our ConvConcatNet model, the
Pearson correlation between the reconstructed and the target mel-spectrogram
can achieve 0.0420, which was ranked as No.1 in the Task 2 of the Auditory EEG
Challenge. The codes and models to implement our work will be available on
Github: https://github.com/xuxiran/ConvConcatNet
( 2
min )
To address the possible lack or total absence of pulses from particle
detectors during the development of its associate electronics, we propose a
model that can generate them without losing the features of the real ones. This
model is based on artificial neural networks, namely Generative Adversarial
Networks (GAN). We describe the proposed network architecture, its training
methodology and the approach to train the GAN with real pulses from a
scintillator receiving radiation from sources of ${}^{137}$Cs and ${}^{22}$Na.
The Generator was installed in a Xilinx's System-On-Chip (SoC). We show how the
network is capable of generating pulses with the same shape as the real ones
that even match the data distributions in the original pulse-height histogram
data.
( 2
min )
We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS
acoustic modelling, trained using optimal-transport conditional flow matching
(OT-CFM). This yields an ODE-based decoder capable of high output quality in
fewer synthesis steps than models trained using score matching. Careful design
choices additionally ensure each synthesis step is fast to run. The method is
probabilistic, non-autoregressive, and learns to speak from scratch without
external alignments. Compared to strong pre-trained baseline models, the
Matcha-TTS system has the smallest memory footprint, rivals the speed of the
fastest models on long utterances, and attains the highest mean opinion score
in a listening test. Please see https://shivammehta25.github.io/Matcha-TTS/ for
audio examples, code, and pre-trained models.
( 2
min )
Transformer requires a fixed number of layers and heads which makes them
inflexible to the complexity of individual samples and expensive in training
and inference. To address this, we propose a sample-based Dynamic Hierarchical
Transformer (DHT) model whose layers and heads can be dynamically configured
with single data samples via solving contextual bandit problems. To determine
the number of layers and heads, we use the Uniform Confidence Bound while we
deploy combinatorial Thompson Sampling in order to select specific head
combinations given their number. Different from previous work that focuses on
compressing trained networks for inference only, DHT is not only advantageous
for adaptively optimizing the underlying network architecture during training
but also has a flexible network for efficient inference. To the best of our
knowledge, this is the first comprehensive data-driven dynamic transformer
without any additional auxiliary neural networks that implement the dynamic
system. According to the experiment results, we achieve up to 74% computational
savings for both training and inference with a minimal loss of accuracy.
( 3
min )
We introduce a novel framework for analyzing reinforcement learning (RL) in
continuous state-action spaces, and use it to prove fast rates of convergence
in both off-line and on-line settings. Our analysis highlights two key
stability properties, relating to how changes in value functions and/or
policies affect the Bellman operator and occupation measures. We argue that
these properties are satisfied in many continuous state-action Markov decision
processes, and demonstrate how they arise naturally when using linear function
approximation methods. Our analysis offers fresh perspectives on the roles of
pessimism and optimism in off-line and on-line RL, and highlights the
connection between off-line RL and transfer learning.
( 2
min )
Digital image correlation (DIC) has become a valuable tool to monitor and
evaluate mechanical experiments of cracked specimen, but the automatic
detection of cracks is often difficult due to inherent noise and artefacts.
Machine learning models have been extremely successful in detecting crack paths
and crack tips using DIC-measured, interpolated full-field displacements as
input to a convolution-based segmentation model. Still, big data is needed to
train such models. However, scientific data is often scarce as experiments are
expensive and time-consuming. In this work, we present a method to directly
generate large amounts of artificial displacement data of cracked specimen
resembling real interpolated DIC displacements. The approach is based on
generative adversarial networks (GANs). During training, the discriminator
receives physical domain knowledge in the form of the derived von Mises
equivalent strain. We show that this physics-guided approach leads to improved
results in terms of visual quality of samples, sliced Wasserstein distance, and
geometry score when compared to a classical unguided GAN approach.
( 2
min )
We propose a novel method for privacy-preserving deep neural networks (DNNs)
with the Vision Transformer (ViT). The method allows us not only to train
models and test with visually protected images but to also avoid the
performance degradation caused from the use of encrypted images, whereas
conventional methods cannot avoid the influence of image encryption. A domain
adaptation method is used to efficiently fine-tune ViT with encrypted images.
In experiments, the method is demonstrated to outperform conventional methods
in an image classification task on the CIFAR-10 and ImageNet datasets in terms
of classification accuracy.
( 2
min )
Federated learning (FL) is a promising technology via which some edge
devices/clients collaboratively train a machine learning model orchestrated by
a server. Learning an unfair model is known as a critical problem in federated
learning, where the trained model may unfairly advantage or disadvantage some
of the devices. To tackle this problem, in this work, we propose AdaFed. The
goal of AdaFed is to find an updating direction for the server along which (i)
all the clients' loss functions are decreasing; and (ii) more importantly, the
loss functions for the clients with larger values decrease with a higher rate.
AdaFed adaptively tunes this common direction based on the values of local
gradients and loss functions. We validate the effectiveness of AdaFed on a
suite of federated datasets, and demonstrate that AdaFed outperforms
state-of-the-art fair FL methods.
( 2
min )
Pandemics, notably the recent COVID-19 outbreak, have impacted both public
health and the global economy. A profound understanding of disease progression
and efficient response strategies is thus needed to prepare for potential
future outbreaks. In this paper, we emphasize the potential of Agent-Based
Models (ABM) in capturing complex infection dynamics and understanding the
impact of interventions. We simulate realistic pharmaceutical, behavioral, and
digital interventions that mirror challenges in real-world policy adoption and
suggest a holistic combination of these interventions for pandemic response.
Using these simulations, we study the trends of emergent behavior on a
large-scale population based on real-world socio-demographic and geo-census
data from Kings County in Washington. Our analysis reveals the pivotal role of
the initial 100 days in dictating a pandemic's course, emphasizing the
importance of quick decision-making and efficient policy development. Further,
we highlight that investing in behavioral and digital interventions can reduce
the burden on pharmaceutical interventions by reducing the total number of
infections and hospitalizations, and by delaying the pandemic's peak. We also
infer that allocating the same amount of dollars towards extensive testing with
contact tracing and self-quarantine offers greater cost efficiency compared to
spending the entire budget on vaccinations.
( 3
min )
Improving energy efficiency in industrial production processes is crucial for
competitiveness, and compliance with climate policies. This paper introduces a
data-driven approach to identify optimal melting patterns in induction
furnaces. Through time-series K-means clustering the melting patterns could be
classified into distinct clusters based on temperature profiles. Using the
elbow method, 12 clusters were identified, representing the range of melting
patterns. Performance parameters such as melting time, energy-specific
performance, and carbon cost were established for each cluster, indicating
furnace efficiency and environmental impact. Multiple criteria decision-making
methods including Simple Additive Weighting, Multiplicative Exponential
Weighting, Technique for Order of Preference by Similarity to Ideal Solution,
modified TOPSIS, and VlseKriterijumska Optimizacija I Kompromisno Resenje were
utilized to determine the best-practice cluster. The study successfully
identified the cluster with the best performance. Implementing the best
practice operation resulted in an 8.6 % reduction in electricity costs,
highlighting the potential energy savings in the foundry.
( 2
min )
This Letter introduces an approach for precisely designing surface friction
properties using a conditional generative machine learning model, specifically
a diffusion denoising probabilistic model (DDPM). We created a dataset of
synthetic surfaces with frictional properties determined by molecular dynamics
simulations, which trained the DDPM to predict surface structures from desired
frictional outcomes. Unlike traditional trial-and-error and numerical
optimization methods, our approach directly yields surface designs meeting
specified frictional criteria with high accuracy and efficiency. This
advancement in material surface engineering demonstrates the potential of
machine learning in reducing the iterative nature of surface design processes.
Our findings not only provide a new pathway for precise surface property
tailoring but also suggest broader applications in material science where
surface characteristics are critical.
( 2
min )
Cooperative multi-agent reinforcement learning is a powerful tool to solve
many real-world cooperative tasks, but restrictions of real-world applications
may require training the agents in a fully decentralized manner. Due to the
lack of information about other agents, it is challenging to derive algorithms
that can converge to the optimal joint policy in a fully decentralized setting.
Thus, this research area has not been thoroughly studied. In this paper, we
seek to systematically review the fully decentralized methods in two settings:
maximizing a shared reward of all agents and maximizing the sum of individual
rewards of all agents, and discuss open questions and future research
directions.
( 2
min )
This paper proposes a classification framework aimed at identifying
correlations between job ad requirements and transversal skill sets, with a
focus on predicting the necessary skills for individual job descriptions using
a deep learning model. The approach involves data collection, preprocessing,
and labeling using ESCO (European Skills, Competences, and Occupations)
taxonomy. Hierarchical classification and multi-label strategies are used for
skill identification, while augmentation techniques address data imbalance,
enhancing model robustness. A comparison between results obtained with
English-specific and multi-language sentence embedding models reveals close
accuracy. The experimental case studies detail neural network configurations,
hyperparameters, and cross-validation results, highlighting the efficacy of the
hierarchical approach and the suitability of the multi-language model for the
diverse European job market. Thus, a new approach is proposed for the
hierarchical classification of transversal skills from job ads.
( 2
min )
Federated learning (FL) is an emerging paradigm for decentralized training of
machine learning models on distributed clients, without revealing the data to
the central server. The learning scheme may be horizontal, vertical or hybrid
(both vertical and horizontal). Most existing research work with deep neural
network (DNN) modelling is focused on horizontal data distributions, while
vertical and hybrid schemes are much less studied. In this paper, we propose a
generalized algorithm FedEmb, for modelling vertical and hybrid DNN-based
learning. The idea of our algorithm is characterised by higher inference
accuracy, stronger privacy-preserving properties, and lower client-server
communication bandwidth demands as compared with existing work. The
experimental results show that FedEmb is an effective method to tackle both
split feature & subject space decentralized problems, shows 0.3% to 4.2%
inference accuracy improvement with limited privacy revealing for datasets
stored in local clients, and reduces 88.9 % time complexity over vertical
baseline method.
( 3
min )
Algorithmic reproducibility measures the deviation in outputs of machine
learning algorithms upon minor changes in the training process. Previous work
suggests that first-order methods would need to trade-off convergence rate
(gradient complexity) for better reproducibility. In this work, we challenge
this perception and demonstrate that both optimal reproducibility and
near-optimal convergence guarantees can be achieved for smooth convex
minimization and smooth convex-concave minimax problems under various
error-prone oracle settings. Particularly, given the inexact initialization
oracle, our regularization-based algorithms achieve the best of both worlds -
optimal reproducibility and near-optimal gradient complexity - for minimization
and minimax optimization. With the inexact gradient oracle, the near-optimal
guarantees also hold for minimax optimization. Additionally, with the
stochastic gradient oracle, we show that stochastic gradient descent ascent is
optimal in terms of both reproducibility and gradient complexity. We believe
our results contribute to an enhanced understanding of the
reproducibility-convergence trade-off in the context of convex optimization.
( 2
min )
We consider the sequential decision-making problem where the mean outcome is
a non-linear function of the chosen action. Compared with the linear model, two
curious phenomena arise in non-linear models: first, in addition to the
"learning phase" with a standard parametric rate for estimation or regret,
there is an "burn-in period" with a fixed cost determined by the non-linear
function; second, achieving the smallest burn-in cost requires new exploration
algorithms. For a special family of non-linear functions named ridge functions
in the literature, we derive upper and lower bounds on the optimal burn-in
cost, and in addition, on the entire learning trajectory during the burn-in
period via differential equations. In particular, a two-stage algorithm that
first finds a good initial action and then treats the problem as locally linear
is statistically optimal. In contrast, several classical algorithms, such as
UCB and algorithms relying on regression oracles, are provably suboptimal.
( 2
min )
We develop a new framework for embedding joint probability distributions in
tensor product reproducing kernel Hilbert spaces (RKHS). Our framework
accommodates a low-dimensional, normalized and positive model of a
Radon-Nikodym derivative, which we estimate from sample sizes of up to several
million data points, alleviating the inherent limitations of RKHS modeling.
Well-defined normalized and positive conditional distributions are natural
by-products to our approach. The embedding is fast to compute and accommodates
learning problems ranging from prediction to classification. Our theoretical
findings are supplemented by favorable numerical results.
( 2
min )
We propose a hierarchical correlation clustering method that extends the
well-known correlation clustering to produce hierarchical clusters applicable
to both positive and negative pairwise dissimilarities. Then, in the following,
we study unsupervised representation learning with such hierarchical
correlation clustering. For this purpose, we first investigate embedding the
respective hierarchy to be used for tree-preserving embedding and feature
extraction. Thereafter, we study the extension of minimax distance measures to
correlation clustering, as another representation learning paradigm. Finally,
we demonstrate the performance of our methods on several datasets.
( 2
min )
Machine learning models typically focus on specific targets like creating
classifiers, often based on known population feature distributions in a
business context. However, models calculating individual features adapt over
time to improve precision, introducing the concept of decoupling: shifting from
point evaluation to data distribution. We use calibration strategies as
strategy for decoupling machine learning (ML) classifiers from score-based
actions within business logic frameworks. To evaluate these strategies, we
perform a comparative analysis using a real-world business scenario and
multiple ML models. Our findings highlight the trade-offs and performance
implications of the approach, offering valuable insights for practitioners
seeking to optimize their decoupling efforts. In particular, the Isotonic and
Beta calibration methods stand out for scenarios in which there is shift
between training and testing data.
( 2
min )
The zero-shot text-to-speech (TTS) method, based on speaker embeddings
extracted from reference speech using self-supervised learning (SSL) speech
representations, can reproduce speaker characteristics very accurately.
However, this approach suffers from degradation in speech synthesis quality
when the reference speech contains noise. In this paper, we propose a
noise-robust zero-shot TTS method. We incorporated adapters into the SSL model,
which we fine-tuned with the TTS model using noisy reference speech. In
addition, to further improve performance, we adopted a speech enhancement (SE)
front-end. With these improvements, our proposed SSL-based zero-shot TTS
achieved high-quality speech synthesis with noisy reference speech. Through the
objective and subjective evaluations, we confirmed that the proposed method is
highly robust to noise in reference speech, and effectively works in
combination with SE.
( 2
min )
Uncertainty estimation is increasingly attractive for improving the
reliability of neural networks. In this work, we present novel credal-set
interval neural networks (CreINNs) designed for classification tasks. CreINNs
preserve the traditional interval neural network structure, capturing weight
uncertainty through deterministic intervals, while forecasting credal sets
using the mathematical framework of probability intervals. Experimental
validations on an out-of-distribution detection benchmark (CIFAR10 vs SVHN)
showcase that CreINNs outperform epistemic uncertainty estimation when compared
to variational Bayesian neural networks (BNNs) and deep ensembles (DEs).
Furthermore, CreINNs exhibit a notable reduction in computational complexity
compared to variational BNNs and demonstrate smaller model sizes than DEs.
( 2
min )
To handle the complexities of irregular and incomplete time series data, we
propose an invertible solution of Neural Differential Equations (NDE)-based
method. While NDE-based methods are a powerful method for analyzing
irregularly-sampled time series, they typically do not guarantee reversible
transformations in their standard form. Our method suggests the variation of
Neural Controlled Differential Equations (Neural CDEs) with Neural Flow, which
ensures invertibility while maintaining a lower computational burden.
Additionally, it enables the training of a dual latent space, enhancing the
modeling of dynamic temporal dynamics. Our research presents an advanced
framework that excels in both classification and interpolation tasks. At the
core of our approach is an enhanced dual latent states architecture, carefully
designed for high precision across various time series tasks. Empirical
analysis demonstrates that our method significantly outperforms existing
models. This work significantly advances irregular time series analysis,
introducing innovative techniques and offering a versatile tool for diverse
practical applications.
( 2
min )
I introduce a unified framework for interpreting neural network classifiers
tailored toward automated scientific discovery. In contrast to neural
network-based regression, for classification, it is in general impossible to
find a one-to-one mapping from the neural network to a symbolic equation even
if the neural network itself bases its classification on a quantity that can be
written as a closed-form equation. In this paper, I embed a trained neural
network into an equivalence class of classifying functions that base their
decisions on the same quantity. I interpret neural networks by finding an
intersection between this equivalence class and human-readable equations
defined by the search space of symbolic regression. The approach is not limited
to classifiers or full neural networks and can be applied to arbitrary neurons
in hidden layers or latent spaces or to simplify the process of interpreting
neural network regressors.
( 2
min )
In this work, we provide a simulation algorithm to simulate from a
(multivariate) characteristic function, which is only accessible in a black-box
format. We construct a generative neural network, whose loss function exploits
a specific representation of the Maximum-Mean-Discrepancy metric to directly
incorporate the targeted characteristic function. The construction is universal
in the sense that it is independent of the dimension and that it does not
require any assumptions on the given characteristic function. Furthermore,
finite sample guarantees on the approximation quality in terms of the
Maximum-Mean Discrepancy metric are derived. The method is illustrated in a
short simulation study.
( 2
min )
Skin cancer is a global health concern, necessitating early and accurate
diagnosis for improved patient outcomes. This study introduces a groundbreaking
approach to skin cancer classification, employing the Vision Transformer, a
state-of-the-art deep learning architecture renowned for its success in diverse
image analysis tasks. Utilizing the HAM10000 dataset of 10,015 meticulously
annotated skin lesion images, the model undergoes preprocessing for enhanced
robustness. The Vision Transformer, adapted to the skin cancer classification
task, leverages the self-attention mechanism to capture intricate spatial
dependencies, achieving superior performance over traditional deep learning
architectures. Segment Anything Model aids in precise segmentation of cancerous
areas, attaining high IOU and Dice Coefficient. Extensive experiments highlight
the model's supremacy, particularly the Google-based ViT patch-32 variant,
which achieves 96.15% accuracy and showcases potential as an effective tool for
dermatologists in skin cancer diagnosis, contributing to advancements in
dermatological practices.
( 2
min )
We consider the sequential decision-making problem where the mean outcome is
a non-linear function of the chosen action. Compared with the linear model, two
curious phenomena arise in non-linear models: first, in addition to the
"learning phase" with a standard parametric rate for estimation or regret,
there is an "burn-in period" with a fixed cost determined by the non-linear
function; second, achieving the smallest burn-in cost requires new exploration
algorithms. For a special family of non-linear functions named ridge functions
in the literature, we derive upper and lower bounds on the optimal burn-in
cost, and in addition, on the entire learning trajectory during the burn-in
period via differential equations. In particular, a two-stage algorithm that
first finds a good initial action and then treats the problem as locally linear
is statistically optimal. In contrast, several classical algorithms, such as
UCB and algorithms relying on regression oracles, are provably suboptimal.
( 2
min )
We propose a hierarchical correlation clustering method that extends the
well-known correlation clustering to produce hierarchical clusters applicable
to both positive and negative pairwise dissimilarities. Then, in the following,
we study unsupervised representation learning with such hierarchical
correlation clustering. For this purpose, we first investigate embedding the
respective hierarchy to be used for tree-preserving embedding and feature
extraction. Thereafter, we study the extension of minimax distance measures to
correlation clustering, as another representation learning paradigm. Finally,
we demonstrate the performance of our methods on several datasets.
( 2
min )
Observational cohort studies are increasingly being used for comparative
effectiveness research to assess the safety of therapeutics. Recently, various
doubly robust methods have been proposed for average treatment effect
estimation by combining the treatment model and the outcome model via different
vehicles, such as matching, weighting, and regression. The key advantage of
doubly robust estimators is that they require either the treatment model or the
outcome model to be correctly specified to obtain a consistent estimator of
average treatment effects, and therefore lead to a more accurate and often more
precise inference. However, little work has been done to understand how doubly
robust estimators differ due to their unique strategies of using the treatment
and outcome models and how machine learning techniques can be combined to boost
their performance. Here we examine multiple popular doubly robust methods and
compare their performance using different treatment and outcome modeling via
extensive simulations and a real-world application. We found that incorporating
machine learning with doubly robust estimators such as the targeted maximum
likelihood estimator gives the best overall performance. Practical guidance on
how to apply doubly robust estimators is provided.
( 3
min )
We introduce a novel framework for analyzing reinforcement learning (RL) in
continuous state-action spaces, and use it to prove fast rates of convergence
in both off-line and on-line settings. Our analysis highlights two key
stability properties, relating to how changes in value functions and/or
policies affect the Bellman operator and occupation measures. We argue that
these properties are satisfied in many continuous state-action Markov decision
processes, and demonstrate how they arise naturally when using linear function
approximation methods. Our analysis offers fresh perspectives on the roles of
pessimism and optimism in off-line and on-line RL, and highlights the
connection between off-line RL and transfer learning.
( 2
min )
Algorithmic reproducibility measures the deviation in outputs of machine
learning algorithms upon minor changes in the training process. Previous work
suggests that first-order methods would need to trade-off convergence rate
(gradient complexity) for better reproducibility. In this work, we challenge
this perception and demonstrate that both optimal reproducibility and
near-optimal convergence guarantees can be achieved for smooth convex
minimization and smooth convex-concave minimax problems under various
error-prone oracle settings. Particularly, given the inexact initialization
oracle, our regularization-based algorithms achieve the best of both worlds -
optimal reproducibility and near-optimal gradient complexity - for minimization
and minimax optimization. With the inexact gradient oracle, the near-optimal
guarantees also hold for minimax optimization. Additionally, with the
stochastic gradient oracle, we show that stochastic gradient descent ascent is
optimal in terms of both reproducibility and gradient complexity. We believe
our results contribute to an enhanced understanding of the
reproducibility-convergence trade-off in the context of convex optimization.
( 2
min )
In this work, we provide a simulation algorithm to simulate from a
(multivariate) characteristic function, which is only accessible in a black-box
format. We construct a generative neural network, whose loss function exploits
a specific representation of the Maximum-Mean-Discrepancy metric to directly
incorporate the targeted characteristic function. The construction is universal
in the sense that it is independent of the dimension and that it does not
require any assumptions on the given characteristic function. Furthermore,
finite sample guarantees on the approximation quality in terms of the
Maximum-Mean Discrepancy metric are derived. The method is illustrated in a
short simulation study.
( 2
min )
We develop a new framework for embedding joint probability distributions in
tensor product reproducing kernel Hilbert spaces (RKHS). Our framework
accommodates a low-dimensional, normalized and positive model of a
Radon-Nikodym derivative, which we estimate from sample sizes of up to several
million data points, alleviating the inherent limitations of RKHS modeling.
Well-defined normalized and positive conditional distributions are natural
by-products to our approach. The embedding is fast to compute and accommodates
learning problems ranging from prediction to classification. Our theoretical
findings are supplemented by favorable numerical results.
( 2
min )
The PGA TOUR continues to enhance the golf experience with real-time data that brings fans closer to the game. To deliver even richer experiences, they are pursuing the development of a next-generation ball position tracking system that automatically tracks the position of the ball on the green. The TOUR currently uses ShotLink powered by CDW, […]
( 9
min )
AI-backed virtual assistants face challenges in handling complex data structures. TaskWeaver helps users build assistants that understand diverse domain questions, follow examples, and efficiently execute customizable algorithms on complex data structures.
The post TaskWeaver: A code-first agent framework for efficient data analytics and domain adaptation appeared first on Microsoft Research.
( 12
min )
The financial services industry is undergoing a significant transformation with the adoption of AI technologies. NVIDIA’s fourth annual State of AI in Financial Services Report provides insights into the current landscape and emerging trends for 2024. The report reveals that an overwhelming 91% of financial services companies are either assessing AI or already using it Read article >
( 7
min )
GFN Thursday recaps the latest cloud announcements from CES 2024 — Day Pass memberships, Cloud G-SYNC technology, expanded NVIDIA Reflex support and more. The new year brings new adventures to the cloud for members, including Diablo IV and Overwatch 2 from Blizzard, Exoprimal from Capcom, Honkai: Star Rail from HoYoverse and Pax Dei from Mainframe Read article >
( 7
min )
In this paper we present a method for single-channel wind noise reduction
using our previously proposed diffusion-based stochastic regeneration model
combining predictive and generative modelling. We introduce a non-additive
speech in noise model to account for the non-linear deformation of the membrane
caused by the wind flow and possible clipping. We show that our stochastic
regeneration model outperforms other neural-network-based wind noise reduction
methods as well as purely predictive and generative models, on a dataset using
simulated and real-recorded wind noise. We further show that the proposed
method generalizes well by testing on an unseen dataset with real-recorded wind
noise. Audio samples, data generation scripts and code for the proposed methods
can be found online (https://uhh.de/inf-sp-storm-wind).
( 2
min )
We present a new fab-in-the-loop reinforcement learning algorithm for the
design of nano-photonic components that accounts for the imperfections present
in nanofabrication processes. As a demonstration of the potential of this
technique, we apply it to the design of photonic crystal grating couplers
fabricated on an air clad 220 nm silicon on insulator single etch platform.
This fab-in-the-loop algorithm improves the insertion loss from 8.8 to 3.24 dB.
The widest bandwidth designs produced using our fab-in-the-loop algorithm can
cover a 150 nm bandwidth with less than 10.2 dB of loss at their lowest point.
( 2
min )
Interactive segmentation is a crucial research area in medical image analysis
aiming to boost the efficiency of costly annotations by incorporating human
feedback. This feedback takes the form of clicks, scribbles, or masks and
allows for iterative refinement of the model output so as to efficiently guide
the system towards the desired behavior. In recent years, deep learning-based
approaches have propelled results to a new level causing a rapid growth in the
field with 121 methods proposed in the medical imaging domain alone. In this
review, we provide a structured overview of this emerging field featuring a
comprehensive taxonomy, a systematic review of existing methods, and an
in-depth analysis of current practices. Based on these contributions, we
discuss the challenges and opportunities in the field. For instance, we find
that there is a severe lack of comparison across methods which needs to be
tackled by standardized baselines and benchmarks.
( 3
min )
The increasing use of Advanced Language Models (ALMs) in diverse sectors,
particularly due to their impressive capability to generate top-tier content
following linguistic instructions, forms the core of this investigation. This
study probes into ALMs' deployment in electronic hardware design, with a
specific emphasis on the synthesis and enhancement of Verilog programming. We
introduce an innovative framework, crafted to assess and amplify ALMs'
productivity in this niche. The methodology commences with the initial crafting
of Verilog programming via ALMs, succeeded by a distinct dual-stage refinement
protocol. The premier stage prioritizes augmenting the code's operational and
linguistic precision, while the latter stage is dedicated to aligning the code
with Power-Performance-Area (PPA) benchmarks, a pivotal component in proficient
hardware design. This bifurcated strategy, merging error remediation with PPA
enhancement, has yielded substantial upgrades in the caliber of ALM-created
Verilog programming. Our framework achieves an 81.37% rate in linguistic
accuracy and 62.0% in operational efficacy in programming synthesis, surpassing
current leading-edge techniques, such as 73% in linguistic accuracy and 46% in
operational efficacy. These findings illuminate ALMs' aptitude in tackling
complex technical domains and signal a positive shift in the mechanization of
hardware design operations.
( 3
min )
Machine learning, particularly graph learning, is gaining increasing
recognition for its transformative impact across various fields. One such
promising application is in the realm of molecule design and discovery, notably
within the pharmaceutical industry. Our survey offers a comprehensive overview
of state-of-the-art methods in molecule design, particularly focusing on
\emph{de novo} drug design, which incorporates (deep) graph learning
techniques. We categorize these methods into three distinct groups: \emph{i)}
\emph{all-at-once}, \emph{ii)} \emph{fragment-based}, and \emph{iii)}
\emph{node-by-node}. Additionally, we introduce some key public datasets and
outline the commonly used evaluation metrics for both the generation and
optimization of molecules. In the end, we discuss the existing challenges in
this field and suggest potential directions for future research.
( 2
min )
In this paper, we present a novel training approach called the Homotopy
Relaxation Training Algorithm (HRTA), aimed at accelerating the training
process in contrast to traditional methods. Our algorithm incorporates two key
mechanisms: one involves building a homotopy activation function that
seamlessly connects the linear activation function with the ReLU activation
function; the other technique entails relaxing the homotopy parameter to
enhance the training refinement process. We have conducted an in-depth analysis
of this novel method within the context of the neural tangent kernel (NTK),
revealing significantly improved convergence rates. Our experimental results,
especially when considering networks with larger widths, validate the
theoretical conclusions. This proposed HRTA exhibits the potential for other
activation functions and deep neural networks.
( 2
min )
The conservation of hydrological resources involves continuously monitoring
their contamination. A multi-agent system composed of autonomous surface
vehicles is proposed in this paper to efficiently monitor the water quality. To
achieve a safe control of the fleet, the fleet policy should be able to act
based on measurements and to the the fleet state. It is proposed to use Local
Gaussian Processes and Deep Reinforcement Learning to jointly obtain effective
monitoring policies. Local Gaussian processes, unlike classical global Gaussian
processes, can accurately model the information in a dissimilar spatial
correlation which captures more accurately the water quality information. A
Deep convolutional policy is proposed, that bases the decisions on the
observation on the mean and variance of this model, by means of an information
gain reward. Using a Double Deep Q-Learning algorithm, agents are trained to
minimize the estimation error in a safe manner thanks to a Consensus-based
heuristic. Simulation results indicate an improvement of up to 24% in terms of
the mean absolute error with the proposed models. Also, training results with
1-3 agents indicate that our proposed approach returns 20% and 24% smaller
average estimation errors for, respectively, monitoring water quality variables
and monitoring algae blooms, as compared to state-of-the-art approaches
( 2
min )
Federated Learning (FL) is a promising distributed learning mechanism which
still faces two major challenges, namely privacy breaches and system
efficiency. In this work, we reconceptualize the FL system from the perspective
of network information theory, and formulate an original FL communication
framework, FedNC, which is inspired by Network Coding (NC). The main idea of
FedNC is mixing the information of the local models by making random linear
combinations of the original parameters, before uploading for further
aggregation. Due to the benefits of the coding scheme, both theoretical and
experimental analysis indicate that FedNC improves the performance of
traditional FL in several important ways, including security, efficiency, and
robustness. To the best of our knowledge, this is the first framework where NC
is introduced in FL. As FL continues to evolve within practical network
frameworks, more variants can be further designed based on FedNC.
( 2
min )
The problem of high-quality drought forecasting up to a year in advance is
critical for agriculture planning and insurance. Yet, it is still unsolved with
reasonable accuracy due to data complexity and aridity stochasticity. We tackle
drought data by introducing an end-to-end approach that adopts a
spatio-temporal neural network model with accessible open monthly climate data
as the input.
Our systematic research employs diverse proposed models and five distinct
environmental regions as a testbed to evaluate the efficacy of the Palmer
Drought Severity Index (PDSI) prediction. Key aggregated findings are the
exceptional performance of a Transformer model, EarthFormer, in making accurate
short-term (up to six months) forecasts. At the same time, the Convolutional
LSTM excels in longer-term forecasting. Both models achieved high ROC AUC
scores: 0.948 for one month ahead and 0.617 for twelve months ahead forecasts.
( 2
min )
As the most basic application and implementation of deep learning, image
classification has grown in popularity. Various datasets are provided by
renowned data science communities for benchmarking machine learning algorithms
and pre-trained models. The ASSIRA Cats & Dogs dataset is one of them and is
being used in this research for its overall acceptance and benchmark standards.
A comparison of various pre-trained models is demonstrated by using different
types of optimizers and loss functions. Hyper-parameters are changed to gain
the best result from a model. By applying this approach, we have got higher
accuracy without major changes in the training model. To run the experiment, we
used three different computer architectures: a laptop equipped with NVIDIA
GeForce GTX 1070, a laptop equipped with NVIDIA GeForce RTX 3080Ti, and a
desktop equipped with NVIDIA GeForce RTX 3090. The acquired results demonstrate
supremacy in terms of accuracy over the previously done experiments on this
dataset. From this experiment, the highest accuracy which is 99.65% is gained
using the NASNet Large.
( 2
min )
The three classes of architectures for time series prediction were tested.
They differ by input layers which contain either convolutional, LSTM, or dense
hypercomplex layers for 4D algebras. The input was four related Stock Market
time series, and the prediction of one of them is expected. The optimization of
hyperparameters related to the classes of architectures was performed in order
to compare the best neural networks within the class. The results show that in
most cases, the architecture with a hypercomplex dense layer provides similar
MAE accuracy to other architectures, however, with considerably less trainable
parameters. Thanks to it, hypercomplex neural networks can be learned and
process data faster than the other tested architectures. Moreover, the order of
the input time series has an impact on effectively.
( 2
min )
This study develops a graph search algorithm to find the optimal
discrimination path for the binary classification problem. The objective
function is defined as the difference of variations between the true positive
(TP) and false positive (FP). It uses the depth first search (DFS) algorithm to
find the top-down paths for discrimination. It proposes a dynamic optimization
procedure to optimize TP at the upper levels and then reduce FP at the lower
levels. To accelerate computing speed with improving accuracy, it proposes a
reduced histogram algorithm with variable bin size instead of looping over all
data points, to find the feature threshold of discrimination. The algorithm is
applied on top of a Support Vector Machine (SVM) model for a binary
classification problem on whether a person is fit or unfit. It significantly
improves TP and reduces FP of the SVM results (e.g., reduced FP by 90% with a
loss of only\ 5% TP). The graph search auto-generates 39 ranked discrimination
paths within 9 seconds on an input of total 328,464 objects, using a dual-core
Laptop computer with a processor of 2.59 GHz.
( 2
min )
With the rapid increase in the number of Anthropogenic Space Objects (ASOs),
Low Earth Orbit (LEO) is facing significant congestion, thereby posing
challenges to space operators and risking the viability of the space
environment for varied uses. Current models for examining this evolution, while
detailed, are computationally demanding. To address these issues, we propose a
novel machine learning-based model, as an extension of the MIT Orbital Capacity
Tool (MOCAT). This advanced model is designed to accelerate the propagation of
ASO density distributions, and it is trained on hundreds of simulations
generated by an established and accurate model of the space environment
evolution. We study how different deep learning-based solutions can potentially
be good candidates for ASO propagation and manage the high-dimensionality of
the data. To assess the model's capabilities, we conduct experiments in long
term forecasting scenarios (around 100 years), analyze how and why the
performance degrades over time, and discuss potential solutions to make this
solution better.
( 2
min )
We use Koopman theory for data-driven model reduction of nonlinear dynamical
systems with controls. We propose generic model structures combining
delay-coordinate encoding of measurements and full-state decoding to integrate
reduced Koopman modeling and state estimation. We present a deep-learning
approach to train the proposed models. A case study demonstrates that our
approach provides accurate control models and enables real-time capable
nonlinear model predictive control of a high-purity cryogenic distillation
column.
( 2
min )
In this paper, we present results on improving out-of-domain weather
prediction and uncertainty estimation as part of the \texttt{Shifts Challenge
on Robustness and Uncertainty under Real-World Distributional Shift} challenge.
We find that by leveraging a mixture of experts in conjunction with an advanced
data augmentation technique borrowed from the computer vision domain, in
conjunction with robust \textit{post-hoc} calibration of predictive
uncertainties, we can potentially achieve more accurate and better-calibrated
results with deep neural networks than with boosted tree models for tabular
data. We quantify our predictions using several metrics and propose several
future lines of inquiry and experimentation to boost performance.
( 2
min )
GANStrument, exploiting GANs with a pitch-invariant feature extractor and
instance conditioning technique, has shown remarkable capabilities in
synthesizing realistic instrument sounds. To further improve the reconstruction
ability and pitch accuracy to enhance the editability of user-provided sound,
we propose HyperGANStrument, which introduces a pitch-invariant hypernetwork to
modulate the weights of a pre-trained GANStrument generator, given a one-shot
sound as input. The hypernetwork modulation provides feedback for the generator
in the reconstruction of the input sound. In addition, we take advantage of an
adversarial fine-tuning scheme for the hypernetwork to improve the
reconstruction fidelity and generation diversity of the generator. Experimental
results show that the proposed model not only enhances the generation
capability of GANStrument but also significantly improves the editability of
synthesized sounds. Audio examples are available at the online demo page.
( 2
min )
Federated Learning (FL) has become an established technique to facilitate
privacy-preserving collaborative training. However, new approaches to FL often
discuss their contributions involving small deep-learning models only. With the
tremendous success of transformer models, the following question arises: What
is necessary to operationalize foundation models in an FL application? Knowing
that computation and communication often take up similar amounts of time in FL,
we introduce a novel taxonomy focused on computational and communication
efficiency methods in FL applications. This said, these methods aim to optimize
the training time and reduce communication between clients and the server. We
also look at the current state of widely used FL frameworks and discuss future
research potentials based on existing approaches in FL research and beyond.
( 2
min )
The success of drug discovery and development relies on the precise
prediction of molecular activities and properties. While in silico molecular
property prediction has shown remarkable potential, its use has been limited so
far to assays for which large amounts of data are available. In this study, we
use a fine-tuned large language model to integrate biological assays based on
their textual information, coupled with Barlow Twins, a Siamese neural network
using a novel self-supervised learning approach. This architecture uses both
assay information and molecular fingerprints to extract the true molecular
information. TwinBooster enables the prediction of properties of unseen
bioassays and molecules by providing state-of-the-art zero-shot learning tasks.
Remarkably, our artificial intelligence pipeline shows excellent performance on
the FS-Mol benchmark. This breakthrough demonstrates the application of deep
learning to critical property prediction tasks where data is typically scarce.
By accelerating the early identification of active molecules in drug discovery
and development, this method has the potential to help streamline the
identification of novel therapeutics.
( 2
min )
In this paper, we first extend the result of FL93 and prove universal
consistency for a classification rule based on wide and deep ReLU neural
networks trained on the logistic loss. Unlike the approach in FL93 that
decomposes the estimation and empirical error, we directly analyze the
classification risk based on the observation that a realization of a neural
network that is wide enough is capable of interpolating an arbitrary number of
points. Secondly, we give sufficient conditions for a class of probability
measures under which classifiers based on neural networks achieve minimax
optimal rates of convergence. Our result is motivated from the practitioner's
observation that neural networks are often trained to achieve 0 training error,
which is the case for our proposed neural network classifiers. Our proofs hinge
on recent developments in empirical risk minimization and on approximation
rates of deep ReLU neural networks for various function classes of interest.
Applications to classical function spaces of smoothness illustrate the
usefulness of our result.
( 2
min )
Deep reinforcement learning (DRL) methods have recently shown promise in path
planning tasks. However, when dealing with global planning tasks, these methods
face serious challenges such as poor convergence and generalization. To this
end, we propose an attention-enhanced DRL method called LOPA (Learn Once Plan
Arbitrarily) in this paper. Firstly, we analyze the reasons of these problems
from the perspective of DRL's observation, revealing that the traditional
design causes DRL to be interfered by irrelevant map information. Secondly, we
develop the LOPA which utilizes a novel attention-enhanced mechanism to attain
an improved attention capability towards the key information of the
observation. Such a mechanism is realized by two steps: (1) an attention model
is built to transform the DRL's observation into two dynamic views: local and
global, significantly guiding the LOPA to focus on the key information on the
given maps; (2) a dual-channel network is constructed to process these two
views and integrate them to attain an improved reasoning capability. The LOPA
is validated via multi-objective global path planning experiments. The result
suggests the LOPA has improved convergence and generalization performance as
well as great path planning efficiency.
( 2
min )
This chapter provides a comprehensive overview of the pragmatic aspects
involved in organizing AI competitions. We begin by discussing strategies to
incentivize participation, touching upon effective communication techniques,
aligning with trending topics in the field, structuring awards, potential
recruitment opportunities, and more. We then shift to the essence of community
engagement, and into organizational best practices and effective means of
disseminating challenge outputs. Lastly, the chapter addresses the logistics,
exposing on costs, required manpower, and resource allocation for effectively
managing and executing a challenge. By examining these practical problems,
readers will gain actionable insights to navigate the multifaceted landscape of
AI competition organization, from inception to completion.
( 2
min )
We propose a methodology, based on machine learning and optimization, for
selecting a solver configuration for a given instance. First, we employ a set
of solved instances and configurations in order to learn a performance function
of the solver. Secondly, we formulate a mixed-integer nonlinear program where
the objective/constraints explicitly encode the learnt information, and which
we solve, upon the arrival of an unknown instance, to find the best solver
configuration for that instance, based on the performance function. The main
novelty of our approach lies in the fact that the configuration set search
problem is formulated as a mathematical program, which allows us to a) enforce
hard dependence and compatibility constraints on the configurations, and b)
solve it efficiently with off-the-shelf optimization tools.
( 2
min )
We study the approximation capacity of some variation spaces corresponding to
shallow ReLU$^k$ neural networks. It is shown that sufficiently smooth
functions are contained in these spaces with finite variation norms. For
functions with less smoothness, the approximation rates in terms of the
variation norm are established. Using these results, we are able to prove the
optimal approximation rates in terms of the number of neurons for shallow
ReLU$^k$ neural networks. It is also shown how these results can be used to
derive approximation bounds for deep neural networks and convolutional neural
networks (CNNs). As applications, we study convergence rates for nonparametric
regression using three ReLU neural network models: shallow neural network,
over-parameterized neural network, and CNN. In particular, we show that shallow
neural networks can achieve the minimax optimal rates for learning H\"older
functions, which complements recent results for deep neural networks. It is
also proven that over-parameterized (deep or shallow) neural networks can
achieve nearly optimal rates for nonparametric regression.
( 2
min )
Graph Neural Networks (GNNs) are able to achieve high classification accuracy
on many important real world datasets, but provide no rigorous notion of
predictive uncertainty. Quantifying the confidence of GNN models is difficult
due to the dependence between datapoints induced by the graph structure. We
leverage recent advances in conformal prediction to construct prediction sets
for node classification in inductive learning scenarios. We do this by taking
an existing approach for conformal classification that relies on
\textit{exchangeable} data and modifying it by appropriately weighting the
conformal scores to reflect the network structure. We show through experiments
on standard benchmark datasets using popular GNN models that our approach
provides tighter and better calibrated prediction sets than a naive application
of conformal prediction.
( 2
min )
The constantly increasing capabilities of artificial intelligence (AI) open
new possibilities for human-AI collaboration. One promising approach to
leverage existing complementary capabilities is allowing humans to delegate
individual instances to the AI. However, enabling humans to delegate instances
effectively requires them to assess both their own and the AI's capabilities in
the context of the given task. In this work, we explore the effects of
providing contextual information on human decisions to delegate instances to an
AI. We find that providing participants with contextual information
significantly improves the human-AI team performance. Additionally, we show
that the delegation behavior changes significantly when participants receive
varying types of contextual information. Overall, this research advances the
understanding of human-AI interaction in human delegation and provides
actionable insights for designing more effective collaborative systems.
( 2
min )
In this article, we consider convergence of stochastic gradient descent
schemes (SGD), including momentum stochastic gradient descent (MSGD), under
weak assumptions on the underlying landscape. More explicitly, we show that on
the event that the SGD stays bounded we have convergence of the SGD if there is
only a countable number of critical points or if the objective function
satisfies Lojasiewicz-inequalities around all critical levels as all analytic
functions do. In particular, we show that for neural networks with analytic
activation function such as softplus, sigmoid and the hyperbolic tangent, SGD
converges on the event of staying bounded, if the random variables modelling
the signal and response in the training are compactly supported.
( 2
min )
In this paper we propose to quantify execution time variability of programs
using statistical dispersion parameters. We show how the execution time
variability can be exploited in mixed criticality real-time systems. We propose
a heuristic to compute the execution time budget to be allocated to each low
criticality real-time task according to its execution time variability. We show
using experiments and simulations that the proposed heuristic reduces the
probability of exceeding the allocated budget compared to algorithms which do
not take into account the execution time variability parameter.
( 2
min )
We study the approximation capacity of some variation spaces corresponding to
shallow ReLU$^k$ neural networks. It is shown that sufficiently smooth
functions are contained in these spaces with finite variation norms. For
functions with less smoothness, the approximation rates in terms of the
variation norm are established. Using these results, we are able to prove the
optimal approximation rates in terms of the number of neurons for shallow
ReLU$^k$ neural networks. It is also shown how these results can be used to
derive approximation bounds for deep neural networks and convolutional neural
networks (CNNs). As applications, we study convergence rates for nonparametric
regression using three ReLU neural network models: shallow neural network,
over-parameterized neural network, and CNN. In particular, we show that shallow
neural networks can achieve the minimax optimal rates for learning H\"older
functions, which complements recent results for deep neural networks. It is
also proven that over-parameterized (deep or shallow) neural networks can
achieve nearly optimal rates for nonparametric regression.
( 2
min )
Graph Neural Networks (GNNs) are able to achieve high classification accuracy
on many important real world datasets, but provide no rigorous notion of
predictive uncertainty. Quantifying the confidence of GNN models is difficult
due to the dependence between datapoints induced by the graph structure. We
leverage recent advances in conformal prediction to construct prediction sets
for node classification in inductive learning scenarios. We do this by taking
an existing approach for conformal classification that relies on
\textit{exchangeable} data and modifying it by appropriately weighting the
conformal scores to reflect the network structure. We show through experiments
on standard benchmark datasets using popular GNN models that our approach
provides tighter and better calibrated prediction sets than a naive application
of conformal prediction.
( 2
min )
In this paper, we first extend the result of FL93 and prove universal
consistency for a classification rule based on wide and deep ReLU neural
networks trained on the logistic loss. Unlike the approach in FL93 that
decomposes the estimation and empirical error, we directly analyze the
classification risk based on the observation that a realization of a neural
network that is wide enough is capable of interpolating an arbitrary number of
points. Secondly, we give sufficient conditions for a class of probability
measures under which classifiers based on neural networks achieve minimax
optimal rates of convergence. Our result is motivated from the practitioner's
observation that neural networks are often trained to achieve 0 training error,
which is the case for our proposed neural network classifiers. Our proofs hinge
on recent developments in empirical risk minimization and on approximation
rates of deep ReLU neural networks for various function classes of interest.
Applications to classical function spaces of smoothness illustrate the
usefulness of our result.
( 2
min )
This post is co-written with Jayadeep Pabbisetty, Sr. Specialist Data Engineering at Merck, and Prabakaran Mathaiyan, Sr. ML Engineer at Tiger Analytics. The large machine learning (ML) model development lifecycle requires a scalable model release process similar to that of software development. Model developers often work together in developing ML models and require a robust […]
( 8
min )
Editor’s note: All papers referenced here represent collaborations throughout Microsoft and across academia and industry that include authors who contribute to Aether, the Microsoft internal advisory body for AI ethics and effects in engineering and research. A surge of generative AI models in the past year has fueled much discussion about the impact of artificial […]
The post Advancing transparency: Updates on responsible AI research appeared first on Microsoft Research.
( 18
min )
NVIDIA continues to be among America’s very best places to work as judged by employees themselves, rising to second place on Glassdoor’s list of best employers for 2024. This is the fourth consecutive year NVIDIA has been among the top five on the closely watched list, which is based on anonymous employee reviews about their Read article >
( 5
min )
Memory constraint of always-on devices is one of the major concerns when
deploying speech processing models on these devices. While larger models
trained with sufficiently large amount of data generally perform better, making
them fit in the device memory is a demanding challenge. In this paper, we aim
to reduce model size by reparameterizing model weights across Transformer
encoder layers and assuming a special weight composition and structure. More
specifically, inspired by ResNet and the more recent LoRA work, we propose an
approach named ResidualTransformer, where each weight matrix in a Transformer
layer comprises 1) a shared full-rank component with its adjacent layers, and
2) a unique low-rank component to itself. The low-rank matrices only account
for a small amount of model size increase. In addition, we add diagonal weight
matrices to improve modeling capacity of the low-rank matrices. Experiments of
our 10k-hour speech recognition and speech translation tasks show that the
Transformer encoder size can be reduced by ~3X with very slight performance
degradation.
( 2
min )
The in-context learning ability of large language models (LLMs) enables them
to generalize to novel downstream tasks with relatively few labeled examples.
However, they require enormous computational resources to be deployed.
Alternatively, smaller models can solve specific tasks if fine-tuned with
enough labeled examples. These examples, however, are expensive to obtain. In
pursuit of the best of both worlds, we study synthetic data generation of
fine-tuning training data via fine-tuned teacher LLMs to improve the downstream
performance of much smaller models. In four text classification and two text
generation tasks, we find that both data generation and annotation dramatically
improve the respective downstream model's performance, occasionally
necessitating only a minor fraction of the original training dataset.
( 2
min )
Personalized recommendations form an important part of today's internet
ecosystem, helping artists and creators to reach interested users, and helping
users to discover new and engaging content. However, many users today are
skeptical of platforms that personalize recommendations, in part due to
historically careless treatment of personal data and data privacy. Now,
businesses that rely on personalized recommendations are entering a new
paradigm, where many of their systems must be overhauled to be privacy-first.
In this article, we propose an algorithm for personalized recommendations that
facilitates both precise and differentially-private measurement. We consider
advertising as an example application, and conduct offline experiments to
quantify how the proposed privacy-preserving algorithm affects key metrics
related to user experience, advertiser value, and platform revenue compared to
the extremes of both (private) non-personalized and non-private, personalized
implementations.
( 2
min )
While a practical wireless network has many tiers where end users do not
directly communicate with the central server, the users' devices have limited
computation and battery powers, and the serving base station (BS) has a fixed
bandwidth. Owing to these practical constraints and system models, this paper
leverages model pruning and proposes a pruning-enabled hierarchical federated
learning (PHFL) in heterogeneous networks (HetNets). We first derive an upper
bound of the convergence rate that clearly demonstrates the impact of the model
pruning and wireless communications between the clients and the associated BS.
Then we jointly optimize the model pruning ratio, central processing unit (CPU)
frequency and transmission power of the clients in order to minimize the
controllable terms of the convergence bound under strict delay and energy
constraints. However, since the original problem is not convex, we perform
successive convex approximation (SCA) and jointly optimize the parameters for
the relaxed convex problem. Through extensive simulation, we validate the
effectiveness of our proposed PHFL algorithm in terms of test accuracy, wall
clock time, energy consumption and bandwidth requirement.
( 2
min )
We present the Multi-Modal Discussion Transformer (mDT), a novel methodfor
detecting hate speech in online social networks such as Reddit discussions. In
contrast to traditional comment-only methods, our approach to labelling a
comment as hate speech involves a holistic analysis of text and images grounded
in the discussion context. This is done by leveraging graph transformers to
capture the contextual relationships in the discussion surrounding a comment
and grounding the interwoven fusion layers that combine text and image
embeddings instead of processing modalities separately. To evaluate our work,
we present a new dataset, HatefulDiscussions, comprising complete multi-modal
discussions from multiple online communities on Reddit. We compare the
performance of our model to baselines that only process individual comments and
conduct extensive ablation studies.
( 2
min )
The separate tasks of denoising, least squares expectation, and manifold
learning can often be posed in a common setting of finding the conditional
expectations arising from a product of two random variables. This paper focuses
on this more general problem and describes an operator theoretic approach to
estimating the conditional expectation. Kernel integral operators are used as a
compactification tool, to set up the estimation problem as a linear inverse
problem in a reproducing kernel Hilbert space. This equation is shown to have
solutions that allow numerical approximation, thus guaranteeing the convergence
of data-driven implementations. The overall technique is easy to implement, and
their successful application to some real-world problems are also shown.
( 2
min )
Directly predicting human epidermal growth factor receptor 2 (HER2) status
from widely available hematoxylin and eosin (HE)-stained whole slide images
(WSIs) can reduce technical costs and expedite treatment selection. Accurately
predicting HER2 requires large collections of multi-site WSIs. Federated
learning enables collaborative training of these WSIs without gigabyte-size
WSIs transportation and data privacy concerns. However, federated learning
encounters challenges in addressing label imbalance in multi-site WSIs from the
real world. Moreover, existing WSI classification methods cannot simultaneously
exploit local context information and long-range dependencies in the site-end
feature representation of federated learning. To address these issues, we
present a point transformer with federated learning for multi-site HER2 status
prediction from HE-stained WSIs. Our approach incorporates two novel designs.
We propose a dynamic label distribution strategy and an auxiliary classifier,
which helps to establish a well-initialized model and mitigate label
distribution variations across sites. Additionally, we propose a farthest
cosine sampling based on cosine distance. It can sample the most distinctive
features and capture the long-range dependencies. Extensive experiments and
analysis show that our method achieves state-of-the-art performance at four
sites with a total of 2687 WSIs. Furthermore, we demonstrate that our model can
generalize to two unseen sites with 229 WSIs.
( 3
min )
We implement a Bayesian inference process for Neural Networks to model the
time to failure of highly reliable weapon systems with interval-censored data
and time-varying covariates. We analyze and benchmark our approach, LaplaceNN,
on synthetic and real datasets with standard classification metrics such as
Receiver Operating Characteristic (ROC) Area Under Curve (AUC) Precision-Recall
(PR) AUC, and reliability curve visualizations.
( 2
min )
This research introduces a sophisticated transfer learning model based on
Google's MobileNetV2 for breast cancer tumor classification into normal,
benign, and malignant categories, utilizing a dataset of 1576 ultrasound images
(265 normal, 891 benign, 420 malignant). The model achieves an accuracy of
0.82, precision of 0.83, recall of 0.81, ROC-AUC of 0.94, PR-AUC of 0.88, and
MCC of 0.74. It examines image intensity distributions and misclassification
errors, offering improvements for future applications. Addressing dataset
imbalances, the study ensures a generalizable model. This work, using a dataset
from Baheya Hospital, Cairo, Egypt, compiled by Walid Al-Dhabyani et al.,
emphasizes MobileNetV2's potential in medical imaging, aiming to improve
diagnostic precision in oncology. Additionally, the paper explores
Streamlit-based deployment for real-time tumor classification, demonstrating
MobileNetV2's applicability in medical imaging and setting a benchmark for
future research in oncology diagnostics.
( 2
min )
Tiger conservation necessitates the strategic deployment of multifaceted
initiatives encompassing the preservation of ecological habitats, anti-poaching
measures, and community involvement for sustainable growth in the tiger
population. With the advent of artificial intelligence, tiger surveillance can
be automated using object detection. In this paper, an accurate illumination
invariant framework is proposed based on EnlightenGAN and YOLOv8 for tiger
detection. The fine-tuned YOLOv8 model achieves a mAP score of 61% without
illumination enhancement. The illumination enhancement improves the mAP by
0.7%. The approaches elevate the state-of-the-art performance on the ATRW
dataset by approximately 6% to 7%.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).B.1
( 2
min )
In this work, we propose an end-to-end adaptive sampling neural network
(MMPDE-Net) based on the moving mesh method, which can adaptively generate new
sampling points by solving the moving mesh PDE. This model focuses on improving
the quality of sampling points generation. Moreover, we develop an iterative
algorithm based on MMPDE-Net, which makes the sampling points more precise and
controllable. Since MMPDE-Net is a framework independent of the deep learning
solver, we combine it with physics-informed neural networks (PINN) to propose
moving sampling PINN (MS-PINN) and demonstrate its effectiveness by error
analysis under some assumptions. Finally, we demonstrate the performance
improvement of MS-PINN compared to PINN through numerical experiments of four
typical examples, which numerically verify the effectiveness of our method.
( 2
min )
We have formulated a family of machine learning problems as the time
evolution of Parametric Probabilistic Models (PPMs), inherently rendering a
thermodynamic process. Our primary motivation is to leverage the rich toolbox
of thermodynamics of information to assess the information-theoretic content of
learning a probabilistic model. We first introduce two information-theoretic
metrics: Memorized-information (M-info) and Learned-information (L-info), which
trace the flow of information during the learning process of PPMs. Then, we
demonstrate that the accumulation of L-info during the learning process is
associated with entropy production, and parameters serve as a heat reservoir in
this process, capturing learned information in the form of M-info.
( 2
min )
Recently, there has been a growing interest in learning and explaining causal
effects within Neural Network (NN) models. By virtue of NN architectures,
previous approaches consider only direct and total causal effects assuming
independence among input variables. We view an NN as a structural causal model
(SCM) and extend our focus to include indirect causal effects by introducing
feedforward connections among input neurons. We propose an ante-hoc method that
captures and maintains direct, indirect, and total causal effects during NN
model training. We also propose an algorithm for quantifying learned causal
effects in an NN model and efficient approximation strategies for quantifying
causal effects in high-dimensional data. Extensive experiments conducted on
synthetic and real-world datasets demonstrate that the causal effects learned
by our ante-hoc method better approximate the ground truth effects compared to
existing methods.
( 2
min )
In this paper, we provide a strategy to determine the eigenvalue decay rate
(EDR) of a large class of kernel functions defined on a general domain rather
than $\mathbb S^{d}$. This class of kernel functions include but are not
limited to the neural tangent kernel associated with neural networks with
different depths and various activation functions. After proving that the
dynamics of training the wide neural networks uniformly approximated that of
the neural tangent kernel regression on general domains, we can further
illustrate the minimax optimality of the wide neural network provided that the
underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an
interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of
NTK. We also showed that the overfitted neural network can not generalize well.
We believe our approach for determining the EDR of kernels might be also of
independent interests.
( 2
min )
Due to mutual interference between users, power allocation problems in
wireless networks are often non-convex and computationally challenging. Graph
neural networks (GNNs) have recently emerged as a promising approach to
tackling these problems and an approach that exploits the underlying topology
of wireless networks. In this paper, we propose a novel graph representation
method for wireless networks that include full-duplex (FD) nodes. We then
design a corresponding FD Graph Neural Network (F-GNN) with the aim of
allocating transmit powers to maximise the network throughput. Our results show
that our F-GNN achieves state-of-art performance with significantly less
computation time. Besides, F-GNN offers an excellent trade-off between
performance and complexity compared to classical approaches. We further refine
this trade-off by introducing a distance-based threshold for inclusion or
exclusion of edges in the network. We show that an appropriately chosen
threshold reduces required training time by roughly 20% with a relatively minor
loss in performance.
( 2
min )
Asymmetrical distance structures (quasimetrics) are ubiquitous in our lives
and are gaining more attention in machine learning applications. Imposing such
quasimetric structures in model representations has been shown to improve many
tasks, including reinforcement learning (RL) and causal relation learning. In
this work, we present four desirable properties in such quasimetric models, and
show how prior works fail at them. We propose Interval Quasimetric Embedding
(IQE), which is designed to satisfy all four criteria. On three quasimetric
learning experiments, IQEs show strong approximation and generalization
abilities, leading to better performance and improved efficiency over prior
methods.
Project Page: https://www.tongzhouwang.info/interval_quasimetric_embedding
Quasimetric Learning Code Package:
https://www.github.com/quasimetric-learning/torch-quasimetric
( 2
min )
While recent research advances in speaker diarization mostly focus on
improving the quality of diarization results, there is also an increasing
interest in improving the efficiency of diarization systems. In this paper, we
demonstrate that a multi-stage clustering strategy that uses different
clustering algorithms for input of different lengths can address multi-faceted
challenges of on-device speaker diarization applications. Specifically, a
fallback clusterer is used to handle short-form inputs; a main clusterer is
used to handle medium-length inputs; and a pre-clusterer is used to compress
long-form inputs before they are processed by the main clusterer. Both the main
clusterer and the pre-clusterer can be configured with an upper bound of the
computational complexity to adapt to devices with different resource
constraints. This multi-stage clustering strategy is critical for streaming
on-device speaker diarization systems, where the budgets of CPU, memory and
battery are tight.
( 2
min )
The stochastic block model is a canonical random graph model for clustering
and community detection on network-structured data. Decades of extensive study
on the problem have established many profound results, among which the phase
transition at the Kesten-Stigum threshold is particularly interesting both from
a mathematical and an applied standpoint. It states that no estimator based on
the network topology can perform substantially better than chance on sparse
graphs if the model parameter is below certain threshold. Nevertheless, if we
slightly extend the horizon to the ubiquitous semi-supervised setting, such a
fundamental limitation will disappear completely. We prove that with arbitrary
fraction of the labels revealed, the detection problem is feasible throughout
the parameter domain. Moreover, we introduce two efficient algorithms, one
combinatorial and one based on optimization, to integrate label information
with graph structures. Our work brings a new perspective to stochastic model of
networks and semidefinite program research.
( 2
min )
Astronomical observations typically provide three-dimensional maps, encoding
the distribution of the observed flux in (1) the two angles of the celestial
sphere and (2) energy/frequency. An important task regarding such maps is to
statistically characterize populations of point sources too dim to be
individually detected. As the properties of a single dim source will be poorly
constrained, instead one commonly studies the population as a whole, inferring
a source-count distribution (SCD) that describes the number density of sources
as a function of their brightness. Statistical and machine learning methods for
recovering SCDs exist; however, they typically entirely neglect spectral
information associated with the energy distribution of the flux. We present a
deep learning framework able to jointly reconstruct the spectra of different
emission components and the SCD of point-source populations. In a
proof-of-concept example, we show that our method accurately extracts even
complex-shaped spectra and SCDs from simulated maps.
( 2
min )
The field of eXplainable Artificial Intelligence (XAI) aims to bring
transparency to today's powerful but opaque deep learning models. While local
XAI methods explain individual predictions in form of attribution maps, thereby
identifying where important features occur (but not providing information about
what they represent), global explanation techniques visualize what concepts a
model has generally learned to encode. Both types of methods thus only provide
partial insights and leave the burden of interpreting the model's reasoning to
the user. In this work we introduce the Concept Relevance Propagation (CRP)
approach, which combines the local and global perspectives and thus allows
answering both the "where" and "what" questions for individual predictions. We
demonstrate the capability of our method in various settings, showcasing that
CRP leads to more human interpretable explanations and provides deep insights
into the model's representation and reasoning through concept atlases, concept
composition analyses, and quantitative investigations of concept subspaces and
their role in fine-grained decision making.
( 2
min )
Split Learning (SL) is a promising Distributed Learning approach in
electromyography (EMG) based prosthetic control, due to its applicability
within resource-constrained environments. Other learning approaches, such as
Deep Learning and Federated Learning (FL), provide suboptimal solutions, since
prosthetic devices are extremely limited in terms of processing power and
battery life. The viability of implementing SL in such scenarios is caused by
its inherent model partitioning, with clients executing the smaller model
segment. However, selecting an inadequate cut layer hinders the training
process in SL systems. This paper presents an algorithm for optimal cut layer
selection in terms of maximizing the convergence rate of the model. The
performance evaluation demonstrates that the proposed algorithm substantially
accelerates the convergence in an EMG pattern recognition task for improving
prosthetic device control.
( 2
min )
The accurate identification of walnuts within orchards brings forth a
plethora of advantages, profoundly amplifying the efficiency and productivity
of walnut orchard management. Nevertheless, the unique characteristics of
walnut trees, characterized by their closely resembling shapes, colors, and
textures between the walnuts and leaves, present a formidable challenge in
precisely distinguishing between them during the annotation process. In this
study, we present a novel approach to improve walnut detection efficiency,
utilizing YOLOv5 trained on an enriched image set that incorporates both real
and synthetic RGB and NIR images. Our analysis comparing results from our
original and augmented datasets shows clear improvements in detection when
using the synthetic images.
( 2
min )
Many adversarial attacks target natural language processing systems, most of
which succeed through modifying the individual tokens of a document. Despite
the apparent uniqueness of each of these attacks, fundamentally they are simply
a distinct configuration of four components: a goal function, allowable
transformations, a search method, and constraints. In this survey, we
systematically present the different components used throughout the literature,
using an attack-independent framework which allows for easy comparison and
categorisation of components. Our work aims to serve as a comprehensive guide
for newcomers to the field and to spark targeted research into refining the
individual attack components.
( 2
min )
Due to their unsupervised training and uncertainty estimation, deep
Variational Autoencoders (VAEs) have become powerful tools for
reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based
TSAD methods, either statistical or deep, tune meta-priors to estimate the
likelihood probability for effectively capturing spatiotemporal dependencies in
the data. However, these methods confront the challenge of inherent data
scarcity, which is often the case in anomaly detection tasks. Such scarcity
easily leads to latent holes, discontinuous regions in latent space, resulting
in non-robust reconstructions on these discontinuous spaces. We propose a novel
generative framework that combines VAEs with self-supervised learning (SSL) to
address this issue.
( 2
min )
Recently, Heterogeneous Federated Learning (HtFL) has attracted attention due
to its ability to support heterogeneous models and data. To reduce the high
communication cost of transmitting model parameters, a major challenge in HtFL,
prototype-based HtFL methods are proposed to solely share class
representatives, a.k.a, prototypes, among heterogeneous clients while
maintaining the privacy of clients' models. However, these prototypes are
naively aggregated into global prototypes on the server using weighted
averaging, resulting in suboptimal global knowledge which negatively impacts
the performance of clients. To overcome this challenge, we introduce a novel
HtFL approach called FedTGP, which leverages our Adaptive-margin-enhanced
Contrastive Learning (ACL) to learn Trainable Global Prototypes (TGP) on the
server. By incorporating ACL, our approach enhances prototype separability
while preserving semantic meaning. Extensive experiments with twelve
heterogeneous models demonstrate that our FedTGP surpasses state-of-the-art
methods by up to 9.08% in accuracy while maintaining the communication and
privacy advantages of prototype-based HtFL. Our code is available at
https://github.com/TsingZ0/FedTGP.
( 2
min )
We address the challenge of estimating the learning rate for adaptive
gradient methods used in training deep neural networks. While several
learning-rate-free approaches have been proposed, they are typically tailored
for steepest descent. However, although steepest descent methods offer an
intuitive approach to finding minima, many deep learning applications require
adaptive gradient methods to achieve faster convergence. In this paper, we
interpret adaptive gradient methods as steepest descent applied on
parameter-scaled networks, proposing learning-rate-free adaptive gradient
methods. Experimental results verify the effectiveness of this approach,
demonstrating comparable performance to hand-tuned learning rates across
various scenarios. This work extends the applicability of learning-rate-free
methods, enhancing training with adaptive gradient methods.
( 2
min )
This paper explores the application of CNN-DNN network fusion to construct a
robot navigation controller within a simulated environment. The simulated
environment is constructed to model a subterranean rescue situation, such that
an autonomous agent is tasked with finding a goal within an unknown cavernous
system. Imitation learning is used to train the control algorithm to use LiDAR
and camera data to navigate the space and find the goal. The trained model is
then tested for robustness using Monte-Carlo.
( 2
min )
Today, many users deploy their microservice-based applications with various
interconnections on a cluster of Cloud machines, subject to stochastic changes
due to dynamic user requirements. To address this problem, we compare three
machine learning (ML) models for predicting the microservice call rates based
on the microservice times and aiming at estimating the scalability
requirements. We apply the linear regression (LR), multilayer perception (MLP),
and gradient boosting regression (GBR) models on the Alibaba microservice
traces. The prediction results reveal that the LR model reaches a lower
training time than the GBR and MLP models. However, the GBR reduces the mean
absolute error and the mean absolute percentage error compared to LR and MLP
models. Moreover, the prediction results show that the required number of
replicas for each microservice by the gradient boosting model is close to the
actual test data without any prediction.
( 2
min )
Federated learning is an emerging distributed machine learning framework in
the Internet of Vehicles (IoV). In IoV, millions of vehicles are willing to
train the model to share their knowledge. Maintaining an active state means the
participants must update their state to the FL server in a fixed interval and
participate to next round. However, the cost by maintaining an active state is
very large when there are a huge number of participating vehicles. In this
paper, we proposed a distributed client selection scheme to reduce the cost of
maintaining the active state for all participants. The clients with the highest
evaluation are elected among the neighbours. In the evaluator, four variables
are considered including sample quantity, throughput available, computational
capability and the quality of the local dataset. We adopted fuzzy logic as the
evaluator since the closed-form solution over four variables does not exist.
Extensive simulation results show our proposal approximates the centralized
client selection in terms of accuracy and can significantly reduce the
communication overhead.
( 2
min )
To address the limitations of traffic prediction from location-bound
detectors, we present Geographical Cellular Traffic (GCT) flow, a novel data
source that leverages the extensive coverage of cellular traffic to capture
mobility patterns. Our extensive analysis validates its potential for
transportation. Focusing on vehicle-related GCT flow prediction, we propose a
graph neural network that integrates multivariate, temporal, and spatial facets
for improved accuracy. Experiments reveal our model's superiority over
baselines, especially in long-term predictions. We also highlight the potential
for GCT flow integration into transportation systems.
( 2
min )
In this work, we study the convergence of Hermitian Dynamic Mode
Decomposition (DMD) to the spectral properties of self-adjoint Koopman
operators. Hermitian DMD is a data-driven method for approximating the Koopman
operator associated with an unknown nonlinear dynamical system from
discrete-time snapshots, while preserving the self-adjointness of the operator
on its finite-dimensional approximations. We show that, under suitable
conditions, the eigenvalues and eigenfunctions of HDMD converge to the spectral
properties of the underlying Koopman operator. Along the way, we establish a
general theorem on the convergence of spectral measures, and demonstrate our
results numerically on the two-dimensional Schr\"odinger equation.
( 2
min )
A genuine signer's signature is naturally unstable even at short
time-intervals whereas, expert forgers always try to perfectly mimic a genuine
signer's signature. This presents a challenge which puts a genuine signer at
risk of being denied access, while a forge signer is granted access. The
implication is a high false acceptance rate (FAR) which is the percentage of
forge signature classified as belonging to a genuine class. Existing work have
only scratched the surface of signature verification because the
misclassification error remains high. In this paper, a consensus-threshold
distance-based classifier criterion is proposed for offline writer-dependent
signature verification. Using features extracted from SigNet and SigNet-F deep
convolutional neural network models, the proposed classifier minimizes FAR.
This is demonstrated via experiments on four datasets: GPDS-300, MCYT, CEDAR
and Brazilian PUC-PR datasets. On GPDS-300, the consensus threshold classifier
improves the state-of-the-art performance by achieving a 1.27% FAR compared to
8.73% and 17.31% recorded in literature. This performance is consistent across
other datasets and guarantees that the risk of imposters gaining access to
sensitive documents or transactions is minimal.
( 2
min )
One common approach to solve multi-objective reinforcement learning (MORL)
problems is to extend conventional Q-learning by using vector Q-values in
combination with a utility function. However issues can arise with this
approach in the context of stochastic environments, particularly when
optimising for the Scalarised Expected Reward (SER) criterion. This paper
extends prior research, providing a detailed examination of the factors
influencing the frequency with which value-based MORL Q-learning algorithms
learn the SER-optimal policy for an environment with stochastic state
transitions. We empirically examine several variations of the core
multi-objective Q-learning algorithm as well as reward engineering approaches,
and demonstrate the limitations of these methods. In particular, we highlight
the critical impact of the noisy Q-value estimates issue on the stability and
convergence of these algorithms.
( 2
min )
Providing high-quality video with efficient bitrate is a main challenge in
video industry. The traditional one-size-fits-all scheme for bitrate ladders is
inefficient and reaching the best content-aware decision computationally
impractical due to extensive encodings required. To mitigate this, we propose a
bitrate and complexity efficient bitrate ladder prediction method using
transfer learning and spatio-temporal features. We propose: (1) using feature
maps from well-known pre-trained DNNs to predict rate-quality behavior with
limited training data; and (2) improving highest quality rung efficiency by
predicting minimum bitrate for top quality and using it for the top rung. The
method tested on 102 video scenes demonstrates 94.1% reduction in complexity
versus brute-force at 1.71% BD-Rate expense. Additionally, transfer learning
was thoroughly studied through four networks and ablation studies.
( 2
min )
Distributed Denial of Service (DDoS) attacks pose a significant threat to the
stability and reliability of online systems. Effective and early detection of
such attacks is pivotal for safeguarding the integrity of networks. In this
work, we introduce an enhanced approach for DDoS attack detection by leveraging
the capabilities of Deep Residual Neural Networks (ResNets) coupled with
synthetic oversampling techniques. Because of the inherent class imbalance in
many cyber-security datasets, conventional methods often struggle with false
negatives, misclassifying subtle DDoS patterns as benign. By applying the
Synthetic Minority Over-sampling Technique (SMOTE) to the CICIDS dataset, we
balance the representation of benign and malicious data points, enabling the
model to better discern intricate patterns indicative of an attack. Our deep
residual network, tailored for this specific task, further refines the
detection process. Experimental results on a real-world dataset demonstrate
that our approach achieves an accuracy of 99.98%, significantly outperforming
traditional methods. This work underscores the potential of combining advanced
data augmentation techniques with deep learning models to bolster
cyber-security defenses.
( 2
min )
Alzheimer's is a brain disease that gets worse over time and affects memory,
thinking, and behavior. Alzheimer's disease (AD) can be treated and managed if
it is diagnosed early, which can slow the progression of symptoms and improve
quality of life. In this study, we suggested using the Visual Transformer (ViT)
and bi-LSTM to process MRI images for diagnosing Alzheimer's disease. We used
ViT to extract features from the MRI and then map them to a feature sequence.
Then, we used Bi-LSTM sequence modeling to keep the interdependencies between
related features. In addition, we evaluated the performance of the proposed
model for the binary classification of AD patients using data from the
Alzheimer's Disease Neuroimaging Initiative (ADNI). Finally, we evaluated our
method against other deep learning models in the literature. The proposed
method performs well in terms of accuracy, precision, F-score, and recall for
the diagnosis of AD.
( 2
min )
Calibration is an essential key in machine leaning. Semi Unsupervised
Calibration through Prior Adaptation (SUCPA) is a calibration algorithm used in
(but not limited to) large-scale language models defined by a {system of
first-order difference equation. The map derived by this system} has the
peculiarity of being non-hyperbolic {with a non-bounded set of non-isolated
fixed points}. In this work, we prove several convergence properties of this
algorithm from the perspective of dynamical systems. For a binary
classification problem, it can be shown that the algorithm always converges,
{more precisely, the map is globally asymptotically stable, and the orbits
converge} to a single line of fixed points. Finally, we perform numerical
experiments on real-world application to support the presented results.
Experiment codes are available online.
( 2
min )
This work aims at improving the energy efficiency of decentralized learning
by optimizing the mixing matrix, which controls the communication demands
during the learning process. Through rigorous analysis based on a
state-of-the-art decentralized learning algorithm, the problem is formulated as
a bi-level optimization, with the lower level solved by graph sparsification. A
solution with guaranteed performance is proposed for the special case of
fully-connected base topology and a greedy heuristic is proposed for the
general case. Simulations based on real topology and dataset show that the
proposed solution can lower the energy consumption at the busiest node by
54%-76% while maintaining the quality of the trained model.
( 2
min )
Recently, Transformer-base models have made significant progress in the field
of time series prediction which have achieved good results and become baseline
models beyond Dlinear. The paper proposes an U-Net time series prediction model
(UnetTSF) with linear complexity, which adopts the U-Net architecture. We are
the first to use FPN technology to extract features from time series data,
replacing the method of decomposing time series data into trend and seasonal
terms, while designing a fusion structure suitable for time series data. After
testing on 8 open-source datasets, compared to the best linear model DLiner.
Out of 32 testing projects, 31 achieved the best results. The average decrease
in mse is 10.1%, while the average decrease in mae is 9.1%. Compared with the
complex transformer-base PatchTST, UnetTSF obtained 9 optimal results for mse
and 15 optimal results for mae in 32 testing projects.
( 2
min )
This paper presents an innovative approach to address the challenges of
translating multi-modal emotion recognition models to a more practical and
resource-efficient uni-modal counterpart, specifically focusing on speech-only
emotion recognition. Recognizing emotions from speech signals is a critical
task with applications in human-computer interaction, affective computing, and
mental health assessment. However, existing state-of-the-art models often rely
on multi-modal inputs, incorporating information from multiple sources such as
facial expressions and gestures, which may not be readily available or feasible
in real-world scenarios. To tackle this issue, we propose a novel framework
that leverages knowledge distillation and masked training techniques.
( 2
min )
Metasurfaces have widespread applications in fifth-generation (5G) microwave
communication. Among the metasurface family, free-form metasurfaces excel in
achieving intricate spectral responses compared to regular-shape counterparts.
However, conventional numerical methods for free-form metasurfaces are
time-consuming and demand specialized expertise. Alternatively, recent studies
demonstrate that deep learning has great potential to accelerate and refine
metasurface designs. Here, we present XGAN, an extended generative adversarial
network (GAN) with a surrogate for high-quality free-form metasurface designs.
The proposed surrogate provides a physical constraint to XGAN so that XGAN can
accurately generate metasurfaces monolithically from input spectral responses.
In comparative experiments involving 20000 free-form metasurface designs, XGAN
achieves 0.9734 average accuracy and is 500 times faster than the conventional
methodology. This method facilitates the metasurface library building for
specific spectral responses and can be extended to various inverse design
problems, including optical metamaterials, nanophotonic devices, and drug
discovery.
( 2
min )
Although gradient descent with momentum is widely used in modern deep
learning, a concrete understanding of its effects on the training trajectory
still remains elusive. In this work, we empirically show that momentum gradient
descent with a large learning rate and learning rate warmup displays large
catapults, driving the iterates towards flatter minima than those found by
gradient descent. We then provide empirical evidence and theoretical intuition
that the large catapult is caused by momentum "amplifying" the
self-stabilization effect (Damian et al., 2023).B.1
( 2
min )
The separate tasks of denoising, least squares expectation, and manifold
learning can often be posed in a common setting of finding the conditional
expectations arising from a product of two random variables. This paper focuses
on this more general problem and describes an operator theoretic approach to
estimating the conditional expectation. Kernel integral operators are used as a
compactification tool, to set up the estimation problem as a linear inverse
problem in a reproducing kernel Hilbert space. This equation is shown to have
solutions that allow numerical approximation, thus guaranteeing the convergence
of data-driven implementations. The overall technique is easy to implement, and
their successful application to some real-world problems are also shown.
( 2
min )
In this paper, we provide a strategy to determine the eigenvalue decay rate
(EDR) of a large class of kernel functions defined on a general domain rather
than $\mathbb S^{d}$. This class of kernel functions include but are not
limited to the neural tangent kernel associated with neural networks with
different depths and various activation functions. After proving that the
dynamics of training the wide neural networks uniformly approximated that of
the neural tangent kernel regression on general domains, we can further
illustrate the minimax optimality of the wide neural network provided that the
underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an
interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of
NTK. We also showed that the overfitted neural network can not generalize well.
We believe our approach for determining the EDR of kernels might be also of
independent interests.
( 2
min )
We introduce the package ddml for Double/Debiased Machine Learning (DDML) in
Stata. Estimators of causal parameters for five different econometric models
are supported, allowing for flexible estimation of causal effects of endogenous
variables in settings with unknown functional forms and/or many exogenous
variables. ddml is compatible with many existing supervised machine learning
programs in Stata. We recommend using DDML in combination with stacking
estimation which combines multiple machine learners into a final predictor. We
provide Monte Carlo evidence to support our recommendation.
( 2
min )
We introduce two data-driven procedures for optimal estimation and inference
in nonparametric models using instrumental variables. The first is a
data-driven choice of sieve dimension for a popular class of sieve two-stage
least squares estimators. When implemented with this choice, estimators of both
the structural function $h_0$ and its derivatives (such as elasticities)
converge at the fastest possible (i.e., minimax) rates in sup-norm. The second
is for constructing uniform confidence bands (UCBs) for $h_0$ and its
derivatives. Our UCBs guarantee coverage over a generic class of
data-generating processes and contract at the minimax rate, possibly up to a
logarithmic factor. As such, our UCBs are asymptotically more efficient than
UCBs based on the usual approach of undersmoothing. As an application, we
estimate the elasticity of the intensive margin of firm exports in a
monopolistic competition model of international trade. Simulations illustrate
the good performance of our procedures in empirically calibrated designs. Our
results provide evidence against common parameterizations of the distribution
of unobserved firm heterogeneity.
( 2
min )
The stochastic block model is a canonical random graph model for clustering
and community detection on network-structured data. Decades of extensive study
on the problem have established many profound results, among which the phase
transition at the Kesten-Stigum threshold is particularly interesting both from
a mathematical and an applied standpoint. It states that no estimator based on
the network topology can perform substantially better than chance on sparse
graphs if the model parameter is below certain threshold. Nevertheless, if we
slightly extend the horizon to the ubiquitous semi-supervised setting, such a
fundamental limitation will disappear completely. We prove that with arbitrary
fraction of the labels revealed, the detection problem is feasible throughout
the parameter domain. Moreover, we introduce two efficient algorithms, one
combinatorial and one based on optimization, to integrate label information
with graph structures. Our work brings a new perspective to stochastic model of
networks and semidefinite program research.
( 2
min )
Characterizing the distribution of high-dimensional statistical estimators is
a challenging task, due to the breakdown of classical asymptotic theory in high
dimension. This paper makes progress towards this by developing non-asymptotic
distributional characterizations for approximate message passing (AMP) -- a
family of iterative algorithms that prove effective as both fast estimators and
powerful theoretical machinery -- for both sparse and robust regression. Prior
AMP theory, which focused on high-dimensional asymptotics for the most part,
failed to describe the behavior of AMP when the number of iterations exceeds
$o\big({\log n}/{\log \log n}\big)$ (with $n$ the sample size). We establish
the first finite-sample non-asymptotic distributional theory of AMP for both
sparse and robust regression that accommodates a polynomial number of
iterations. Our results derive approximate accuracy of Gaussian approximation
of the AMP iterates, which improves upon all prior results and implies enhanced
distributional characterizations for both optimally tuned Lasso and robust
M-estimator.
( 2
min )
In this manuscript, we propose an efficient manifold denoiser based on
landmark diffusion and optimal shrinkage under the complicated high dimensional
noise and compact manifold setup. It is flexible to handle several setups,
including the high ambient space dimension with a manifold embedding that
occupies a subspace of high or low dimensions, and the noise could be colored
and dependent. A systematic comparison with other existing algorithms on both
simulated and real datasets is provided. This manuscript is mainly algorithmic
and we report several existing tools and numerical results. Theoretical
guarantees and more comparisons will be reported in the official paper of this
manuscript.
( 2
min )
Due to their unsupervised training and uncertainty estimation, deep
Variational Autoencoders (VAEs) have become powerful tools for
reconstruction-based Time Series Anomaly Detection (TSAD). Existing VAE-based
TSAD methods, either statistical or deep, tune meta-priors to estimate the
likelihood probability for effectively capturing spatiotemporal dependencies in
the data. However, these methods confront the challenge of inherent data
scarcity, which is often the case in anomaly detection tasks. Such scarcity
easily leads to latent holes, discontinuous regions in latent space, resulting
in non-robust reconstructions on these discontinuous spaces. We propose a novel
generative framework that combines VAEs with self-supervised learning (SSL) to
address this issue.
( 2
min )
With the rapid adoption of generative AI applications, there is a need for these applications to respond in time to reduce the perceived latency with higher throughput. Foundation models (FMs) are often pre-trained on vast corpora of data with parameters ranging in scale of millions to billions and beyond. Large language models (LLMs) are a […]
( 15
min )
In this post, we walk you through the process to deploy Amazon Q in your AWS account and add it to your Slack workspace. When you’re done, you’ll wonder how you ever managed without it!
( 8
min )
Ninety-eight percent of retailers plan to invest in generative AI in the next 18 months, according to a new survey conducted by NVIDIA. That makes retail one of the industries racing fastest to adopt generative AI to ramp up productivity, transform customer experiences and improve efficiency. Early deployments in the retail industry include personalized shopping Read article >
( 6
min )
The retail industry is in the midst of a major technology transformation, fueled by the rise in AI. With the highest potential for AI and analytics among all industries, the retail and consumer packaged goods (CPG) sectors are poised to harness the power of AI to enhance operational efficiency, elevate customer and employee experiences and Read article >
( 6
min )
NVIDIA and the Loss Prevention Research Council (LPRC) are collaborating with several AI companies to showcase a real-time solution for combating and preventing organized retail crime (ORC). The integrated offering provides advance notifications of suspicious behavior inside and outside stores so that authorities can intervene early. The LPRC includes asset-protection executives from more than 85 Read article >
( 6
min )
AI Weirdness: the strange side of machine learning
( 2
min )
This study examines the impact of class-imbalanced data on deep learning
models and proposes a technique for data balancing by generating synthetic data
for the minority class. Unlike random-based oversampling, our method
prioritizes balancing the informative regions by identifying high entropy
samples. Generating well-placed synthetic data can enhance machine learning
algorithms accuracy and efficiency, whereas poorly-placed ones may lead to
higher misclassification rates. We introduce an algorithm that maximizes the
probability of generating a synthetic sample in the correct region of its class
by optimizing the class posterior ratio. Additionally, to maintain data
topology, synthetic data are generated within each minority sample's
neighborhood. Our experimental results on forty-one datasets demonstrate the
superior performance of our technique in enhancing deep-learning models.
( 2
min )
Weather station data is a valuable resource for climate prediction, however,
its reliability can be limited in remote locations. To compound the issue,
making local predictions often relies on sensor data that may not be accessible
for a new, previously unmonitored location. In response to these challenges, we
propose a novel zero-shot learning approach designed to forecast various
climate measurements at new and unmonitored locations. Our method surpasses
conventional weather forecasting techniques in predicting microclimate
variables by leveraging knowledge extracted from other geographic locations.
( 2
min )
Current state-of-the-art analyses on the convergence of gradient descent for
training neural networks focus on characterizing properties of the loss
landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted
strong convexity. While gradient descent converges linearly under such
conditions, it remains an open question whether Nesterov's momentum enjoys
accelerated convergence under similar settings and assumptions. In this work,
we consider a new class of objective functions, where only a subset of the
parameters satisfies strong convexity, and show Nesterov's momentum achieves
acceleration in theory for this objective class. We provide two realizations of
the problem class, one of which is deep ReLU networks, which --to the best of
our knowledge--constitutes this work the first that proves accelerated
convergence rate for non-trivial neural network architectures.
( 2
min )
Discovering human cognitive and emotional states using multi-modal
physiological signals draws attention across various research applications.
Physiological responses of the human body are influenced by human cognition and
commonly used to analyze cognitive states. From a network science perspective,
the interactions of these heterogeneous physiological modalities in a graph
structure may provide insightful information to support prediction of cognitive
states. However, there is no clue to derive exact connectivity between
heterogeneous modalities and there exists a hierarchical structure of
sub-modalities. Existing graph neural networks are designed to learn on
non-hierarchical homogeneous graphs with pre-defined graph structures; they
failed to learn from hierarchical, multi-modal physiological data without a
pre-defined graph structure. To this end, we propose a hierarchical
heterogeneous graph generative network (H2G2-Net) that automatically learns a
graph structure without domain knowledge, as well as a powerful representation
on the hierarchical heterogeneous graph in an end-to-end fashion. We validate
the proposed method on the CogPilot dataset that consists of multi-modal
physiological signals. Extensive experiments demonstrate that our proposed
method outperforms the state-of-the-art GNNs by 5%-20% in prediction accuracy.
( 2
min )
Optical lithography is the main enabler to semiconductor manufacturing. It
requires extensive processing to perform the Resolution Enhancement Techniques
(RETs) required to transfer the design data to a working Integrated Circuits
(ICs). The processing power and computational runtime for RETs tasks is ever
increasing due to the continuous reduction of the feature size and the
expansion of the chip area. State-of-the-art research sought Machine Learning
(ML) technologies to reduce runtime and computational power, however they are
still not used in production yet. In this study, we analyze the reasons holding
back ML computational lithography from being production ready and present a
novel highly scalable end-to-end flow that enables production ready ML-RET
correction.
( 2
min )
Understanding and identifying musical shape plays an important role in music
education and performance assessment. To simplify the otherwise time- and
cost-intensive musical shape evaluation, in this paper we explore how
artificial intelligence (AI) driven models can be applied. Considering musical
shape evaluation as a classification problem, a light-weight Siamese residual
neural network (S-ResNN) is proposed to automatically identify musical shapes.
To assess the proposed approach in the context of piano musical shape
evaluation, we have generated a new dataset, containing 4116 music pieces
derived by 147 piano preparatory exercises and performed in 28 categories of
musical shapes. The experimental results show that the S-ResNN significantly
outperforms a number of benchmark methods in terms of the precision, recall and
F1 score.
( 2
min )
Geometric Sensitive Hashing functions, a family of Local Sensitive Hashing
functions, are neural network models that learn class-specific manifold
geometry in supervised learning. However, given a set of supervised learning
tasks, understanding the manifold geometries that can represent each task and
the kinds of relationships between the tasks based on them has received little
attention. We explore a formalization of this question by considering a
generative process where each task is associated with a high-dimensional
manifold, which can be done in brain-like models with neuromodulatory systems.
Following this formulation, we define \emph{Task-specific Geometric Sensitive
Hashing~(T-GSH)} and show that a randomly weighted neural network with a
neuromodulation system can realize this function.
( 2
min )
Sequential optimization methods are often confronted with the curse of
dimensionality in high-dimensional spaces. Current approaches under the
Gaussian process framework are still burdened by the computational complexity
of tracking Gaussian process posteriors and need to partition the optimization
problem into small regions to ensure exploration or assume an underlying
low-dimensional structure. With the idea of transiting the candidate points
towards more promising positions, we propose a new method based on Markov Chain
Monte Carlo to efficiently sample from an approximated posterior. We provide
theoretical guarantees of its convergence in the Gaussian process Thompson
sampling setting. We also show experimentally that both the Metropolis-Hastings
and the Langevin Dynamics version of our algorithm outperform state-of-the-art
methods in high-dimensional sequential optimization and reinforcement learning
benchmarks.
( 2
min )
Predicting the solubility of given molecules remains crucial in the
pharmaceutical industry. In this study, we revisited this extensively studied
topic, leveraging the capabilities of contemporary computing resources. We
employed two machine learning models: a linear regression model and a graph
convolutional neural network (GCNN) model, using various experimental datasets.
Both methods yielded reasonable predictions, with the GCNN model exhibiting the
highest level of performance. However, the present GCNN model has limited
interpretability while the linear regression model allows scientists for a
greater in-depth analysis of the underlying factors through feature importance
analysis, although more human inputs and evaluations on the overall dataset is
required. From the perspective of chemistry, using the linear regression model,
we elucidated the impact of individual atom species and functional groups on
overall solubility, highlighting the significance of comprehending how chemical
structure influences chemical properties in the drug development process. It is
learned that introducing oxygen atoms can increase the solubility of organic
molecules, while almost all other hetero atoms except oxygen and nitrogen tend
to decrease solubility.
( 3
min )
In this paper, we introduce FITS, a lightweight yet powerful model for time
series analysis. Unlike existing models that directly process raw time-domain
data, FITS operates on the principle that time series can be manipulated
through interpolation in the complex frequency domain. By discarding
high-frequency components with negligible impact on time series data, FITS
achieves performance comparable to state-of-the-art models for time series
forecasting and anomaly detection tasks, while having a remarkably compact size
of only approximately $10k$ parameters. Such a lightweight model can be easily
trained and deployed in edge devices, creating opportunities for various
applications. The code is available in: \url{https://github.com/VEWOXIC/FITS}
( 2
min )
Quantum machine learning with quantum kernels for classification problems is
a growing area of research. Recently, quantum kernel alignment techniques that
parameterise the kernel have been developed, allowing the kernel to be trained
and therefore aligned with a specific dataset. While quantum kernel alignment
is a promising technique, it has been hampered by considerable training costs
because the full kernel matrix must be constructed at every training iteration.
Addressing this challenge, we introduce a novel method that seeks to balance
efficiency and performance. We present a sub-sampling training approach that
uses a subset of the kernel matrix at each training step, thereby reducing the
overall computational cost of the training. In this work, we apply the
sub-sampling method to synthetic datasets and a real-world breast cancer
dataset and demonstrate considerable reductions in the number of circuits
required to train the quantum kernel while maintaining classification accuracy.
( 2
min )
Sampling-based model predictive control (MPC) has found significant success
in optimal control problems with non-smooth system dynamics and cost function.
Many machine learning-based works proposed to improve MPC by a) learning or
fine-tuning the dynamics/ cost function, or b) learning to optimize for the
update of the MPC controllers. For the latter, imitation learning-based
optimizers are trained to update the MPC controller by mimicking the expert
demonstrations, which, however, are expensive or even unavailable. More
significantly, many sequential decision-making problems are in non-stationary
environments, requiring that an optimizer should be adaptable and generalizable
to update the MPC controller for solving different tasks. To address those
issues, we propose to learn an optimizer based on meta-reinforcement learning
(RL) to update the controllers. This optimizer does not need expert
demonstration and can enable fast adaptation (e.g., few-shots) when it is
deployed in unseen control tasks. Experimental results validate the
effectiveness of the learned optimizer regarding fast adaptation.
( 2
min )
Deep machine learning models including Convolutional Neural Networks (CNN)
have been successful in the detection of Mild Cognitive Impairment (MCI) using
medical images, questionnaires, and videos. This paper proposes a novel
Multi-branch Classifier-Video Vision Transformer (MC-ViViT) model to
distinguish MCI from those with normal cognition by analyzing facial features.
The data comes from the I-CONECT, a behavioral intervention trial aimed at
improving cognitive function by providing frequent video chats. MC-ViViT
extracts spatiotemporal features of videos in one branch and augments
representations by the MC module. The I-CONECT dataset is challenging as the
dataset is imbalanced containing Hard-Easy and Positive-Negative samples, which
impedes the performance of MC-ViViT. We propose a loss function for Hard-Easy
and Positive-Negative Samples (HP Loss) by combining Focal loss and AD-CORRE
loss to address the imbalanced problem. Our experimental results on the
I-CONECT dataset show the great potential of MC-ViViT in predicting MCI with a
high accuracy of 90.63% accuracy on some of the interview videos.
( 3
min )
With the continued introduction of driverless events to Formula:Society of
Automotive Engineers (F:SAE) competitions around the world, teams are
investigating all aspects of the autonomous vehicle stack. This paper presents
the use of Deep Reinforcement Learning (DRL) and Inverse Reinforcement Learning
(IRL) to map locally-observed cone positions to a desired steering angle for
race track following. Two state-of-the-art algorithms not previously tested in
this context: soft actor critic (SAC) and adversarial inverse reinforcement
learning (AIRL), are used to train models in a representative simulation. Three
novel reward functions for use by RL algorithms in an autonomous racing context
are also discussed. Tests performed in simulation and the real world suggest
that both algorithms can successfully train models for local path following.
Suggestions for future work are presented to allow these models to scale to a
full F:SAE vehicle.
( 2
min )
As the global population continues to expand, the demand for natural
resources increases. Unfortunately, human activities account for 23% of
greenhouse gas emissions. On a positive note, remote sensing technologies have
emerged as a valuable tool in managing our environment. These technologies
allow us to monitor land use, plan urban areas, and drive advancements in areas
such as agriculture, climate change mitigation, disaster recovery, and
environmental monitoring. Recent advances in AI, computer vision, and earth
observation data have enabled unprecedented accuracy in land use mapping. By
using transfer learning and fine-tuning with RGB bands, we achieved an
impressive 99.19% accuracy in land use analysis. Such findings can be used to
inform conservation and urban planning policies.
( 2
min )
We develop a distributed Block Chebyshev-Davidson algorithm to solve
large-scale leading eigenvalue problems for spectral analysis in spectral
clustering. First, the efficiency of the Chebyshev-Davidson algorithm relies on
the prior knowledge of the eigenvalue spectrum, which could be expensive to
estimate. This issue can be lessened by the analytic spectrum estimation of the
Laplacian or normalized Laplacian matrices in spectral clustering, making the
proposed algorithm very efficient for spectral clustering. Second, to make the
proposed algorithm capable of analyzing big data, a distributed and parallel
version has been developed with attractive scalability. The speedup by parallel
computing is approximately equivalent to $\sqrt{p}$, where $p$ denotes the
number of processes. {Numerical results will be provided to demonstrate its
efficiency in spectral clustering and scalability advantage over existing
eigensolvers used for spectral clustering in parallel computing environments.}
( 2
min )
This paper presents our recent initiatives to foster the discoverability of
new releases on the music streaming service Deezer. After introducing our
search and recommendation features dedicated to new releases, we outline our
shift from editorial to personalized release suggestions using cold start
embeddings and contextual bandits. Backed by online experiments, we discuss the
advantages of this shift in terms of recommendation quality and exposure of new
releases on the service.
( 2
min )
Deep generative replay has emerged as a promising approach for continual
learning in decision-making tasks. This approach addresses the problem of
catastrophic forgetting by leveraging the generation of trajectories from
previously encountered tasks to augment the current dataset. However, existing
deep generative replay methods for continual learning rely on autoregressive
models, which suffer from compounding errors in the generated trajectories. In
this paper, we propose a simple, scalable, and non-autoregressive method for
continual learning in decision-making tasks using a generative model that
generates task samples conditioned on the trajectory timestep. We evaluate our
method on Continual World benchmarks and find that our approach achieves
state-of-the-art performance on the average success rate metric among continual
learning methods. Code is available at https://github.com/WilliamYue37/t-DGR .
( 2
min )
Climate change poses increasingly complex challenges to our society. Extreme
weather events such as floods, wild fires or droughts are becoming more
frequent, spontaneous and difficult to foresee or counteract. In this work we
specifically address the problem of sewage water polluting surface water bodies
after spilling over from rain tanks as a consequence of heavy rain events. We
investigate to what extent state-of-the-art interpretable time series models
can help predict such critical water level points, so that the excess can
promptly be redistributed across the sewage network. Our results indicate that
modern time series models can contribute to better waste water management and
prevention of environmental pollution from sewer systems. All the code and
experiments can be found in our repository:
https://github.com/TeodorChiaburu/RIWWER_TimeSeries.
( 2
min )
Large Language Models (LLMs) have showcased impressive capabilities in
handling straightforward programming tasks. However, their performance tends to
falter when confronted with more challenging programming problems. We observe
that conventional models often generate solutions as monolithic code blocks,
restricting their effectiveness in tackling intricate questions. To overcome
this limitation, we present Modular-of-Thought Coder (MoTCoder). We introduce a
pioneering framework for MoT instruction tuning, designed to promote the
decomposition of tasks into logical sub-tasks and sub-modules. Our
investigations reveal that, through the cultivation and utilization of
sub-modules, MoTCoder significantly improves both the modularity and
correctness of the generated solutions, leading to substantial relative pass@1
improvements of 12.9% on APPS and 9.43% on CodeContests. Our codes are
available at https://github.com/dvlab-research/MoTCoder.
( 2
min )
Physics-informed neural network (PINN) is a data-driven solver for partial
and ordinary differential equations(ODEs/PDEs). It provides a unified framework
to address both forward and inverse problems. However, the complexity of the
objective function often leads to training failures. This issue is particularly
prominent when solving high-frequency and multi-scale problems. We proposed
using transfer learning to boost the robustness and convergence of training
PINN, starting training from low-frequency problems and gradually approaching
high-frequency problems. Through two case studies, we discovered that transfer
learning can effectively train PINN to approximate solutions from low-frequency
problems to high-frequency problems without increasing network parameters.
Furthermore, it requires fewer data points and less training time. We
elaborately described our training strategy, including optimizer selection, and
suggested guidelines for using transfer learning to train neural networks for
solving more complex problems.
( 2
min )
We discuss causal inference for observational studies with possibly invalid
instrumental variables. We propose a novel methodology called two-stage
curvature identification (TSCI) by exploring the nonlinear treatment model with
machine learning. {The first-stage machine learning enables improving the
instrumental variable's strength and adjusting for different forms of violating
the instrumental variable assumptions.} The success of TSCI requires the
instrumental variable's effect on treatment to differ from its violation form.
A novel bias correction step is implemented to remove bias resulting from the
potentially high complexity of machine learning. Our proposed \texttt{TSCI}
estimator is shown to be asymptotically unbiased and Gaussian even if the
machine learning algorithm does not consistently estimate the treatment model.
Furthermore, we design a data-dependent method to choose the best among several
candidate violation forms. We apply TSCI to study the effect of education on
earnings.
( 2
min )
Sequential optimization methods are often confronted with the curse of
dimensionality in high-dimensional spaces. Current approaches under the
Gaussian process framework are still burdened by the computational complexity
of tracking Gaussian process posteriors and need to partition the optimization
problem into small regions to ensure exploration or assume an underlying
low-dimensional structure. With the idea of transiting the candidate points
towards more promising positions, we propose a new method based on Markov Chain
Monte Carlo to efficiently sample from an approximated posterior. We provide
theoretical guarantees of its convergence in the Gaussian process Thompson
sampling setting. We also show experimentally that both the Metropolis-Hastings
and the Langevin Dynamics version of our algorithm outperform state-of-the-art
methods in high-dimensional sequential optimization and reinforcement learning
benchmarks.
( 2
min )
A multimodal system uses models trained on language, vision, and action data to help robots develop and execute plans for household, construction, and manufacturing tasks.
( 10
min )
MIT researchers propose “PEDS” method for developing models of complex physical systems in mechanics, optics, thermal transport, fluid dynamics, physical chemistry, climate, and more.
( 8
min )
AWS customers in healthcare, financial services, the public sector, and other industries store billions of documents as images or PDFs in Amazon Simple Storage Service (Amazon S3). However, they’re unable to gain insights such as using the information locked in the documents for large language models (LLMs) or search until they extract the text, forms, […]
( 10
min )
Generative AI is transforming drug research and development, enabling new discoveries faster than ever — and Amgen, one of the world’s leading biotechnology companies, is tapping the technology to power its research. Amgen will build AI models trained to analyze one of the world’s largest human datasets on an NVIDIA DGX SuperPOD, a full-stack data Read article >
( 6
min )
In perhaps the healthcare industry’s most dramatic transformation since the advent of computing, digital biology and generative AI are helping to reinvent drug discovery, surgery, medical imaging and wearable devices. NVIDIA has been preparing for this moment for over a decade, building deep domain expertise, creating the NVIDIA Clara healthcare-specific computing platform and expanding its Read article >
( 7
min )
The AI revolution returned to where it started this week, putting powerful new tools into the hands of gamers and content creators. Generative AI models that will bring lifelike characters to games and applications and new GPUs for gamers and creators were among the highlights of a news-packed address Monday ahead of this week’s CES Read article >
( 9
min )
Amid explosive interest in generative AI, the auto industry is racing to embrace the power of AI across a range of critical activities, from vehicle design, engineering and manufacturing, to marketing and sales. The adoption of generative AI — along with the growing importance of software-defined computing — will continue to transform the automotive market Read article >
( 6
min )
NVIDIA Studio is debuting at CES powerful new software and hardware upgrades to elevate content creation.
( 11
min )
Twitch, OBS and NVIDIA are leveling up livestreaming technology with the new Twitch Enhanced Broadcasting beta, powered by GeForce RTX GPUs. Available in a few days, streamers will be able to stream multiple encodes concurrently, providing optimal viewing experiences for all viewers.
( 5
min )
Getty Images, a global visual content creator and marketplace, today at CES released Generative AI by iStock, an affordable and commercially safe image generation service trained on the company’s creative library of licensed, proprietary data. Built on NVIDIA Picasso, a foundry for custom AI models, Generative AI by iStock provides designers and businesses with a Read article >
( 5
min )
Whether building a super-capable truck or conjuring up a dream sports car, spending hours playing with online car configurators is easy. With auto industry insiders predicting that most new vehicle purchases will move online by 2030, these configurators are more than just toys. They’re crucial to the future of the world’s automakers — essential in Read article >
( 6
min )
NVIDIA is bringing more games, membership options and innovative tech to its GeForce NOW cloud gaming service. The next Activision and Blizzard titles to join the cloud, Diablo IV and Overwatch 2, will be coming soon. They’ll be joined by a host of top titles, including Capcom’s Exoprimal, HoYoverse’s Honkai: Star Rail and Mainframe Industries’ Read article >
( 9
min )
Generative AI is reshaping trillion-dollar industries, and NVIDIA, a front-runner in smart robotics, is seizing the moment. Speaking today as part of a special address ahead of CES, NVIDIA Vice President of Robotics and Edge Computing Deepu Talla detailed how NVIDIA and its partners are bringing generative AI and robotics together. It’s a natural fit, Read article >
( 6
min )
In our fast-changing, digitized world business strategies, and content planning are also moving into the world of numbers, minimizing the need for human work. Nowadays, artificial intelligence is developing day by day, expanding over more and more users and areas of use. Below you will learn about AI chatbots, their advantages and disadvantages. You will… Read More »Unleashing innovation: How AI chatbots transform your website strategy
The post Unleashing innovation: How AI chatbots transform your website strategy appeared first on Data Science Central.
( 23
min )
There is a new letter on TIME, What Generative AI Reveals About the Human Mind, where a professor wrote, “Natural brains must learn to predict those sensory flows in a very special kind of context—the context of using the sensory information to select actions that help us survive and thrive in our worlds. This means… Read More »Textual predictive coding: Do LLMs and the human mind compare?
The post Textual predictive coding: Do LLMs and the human mind compare? appeared first on Data Science Central.
( 20
min )
In the short-paced landscape of information-driven decision-making, actual-time analytics has come to be paramount for corporations seeking to benefit from insights at the rate of the enterprise. Database streaming offerings have emerged as a transformative answer, allowing the processing and analysis of facts in movement. This article explores the abilities of database streaming services and… Read More »Real-time analytics with database streaming services: Harnessing data velocity
The post Real-time analytics with database streaming services: Harnessing data velocity appeared first on Data Science Central.
( 21
min )
In Part 1 of the series “GenAI: Beware the Productivity Trap,” we discussed embracing an economic mindset to avoid falling into the productivity trap. We discussed some challenges with the productivity trap and then reviewed some data economic concepts that can take your organization to the next level of game-changing performance and innovation. In Part… Read More »GenAI: Beware the Productivity Trap; It’s About Nanoeconomics – Part 2
The post GenAI: Beware the Productivity Trap; It’s About Nanoeconomics – Part 2 appeared first on Data Science Central.
( 20
min )
Recent CNN and Transformer-based models tried to utilize frequency and
periodicity information for long-term time series forecasting. However, most
existing work is based on Fourier transform, which cannot capture fine-grained
and local frequency structure. In this paper, we propose a Wavelet-Fourier
Transform Network (WFTNet) for long-term time series forecasting. WFTNet
utilizes both Fourier and wavelet transforms to extract comprehensive
temporal-frequency information from the signal, where Fourier transform
captures the global periodic patterns and wavelet transform captures the local
ones. Furthermore, we introduce a Periodicity-Weighted Coefficient (PWC) to
adaptively balance the importance of global and local frequency patterns.
Extensive experiments on various time series datasets show that WFTNet
consistently outperforms other state-of-the-art baseline. Code is available at
https://github.com/Hank0626/WFTNet.
( 2
min )
A cost-effective alternative to manual data labeling is weak supervision
(WS), where data samples are automatically annotated using a predefined set of
labeling functions (LFs), rule-based mechanisms that generate artificial labels
for the associated classes. In this work, we investigate noise reduction
techniques for WS based on the principle of k-fold cross-validation. We
introduce a new algorithm ULF for Unsupervised Labeling Function correction,
which denoises WS data by leveraging models trained on all but some LFs to
identify and correct biases specific to the held-out LFs. Specifically, ULF
refines the allocation of LFs to classes by re-estimating this assignment on
highly reliable cross-validated samples. Evaluation on multiple datasets
confirms ULF's effectiveness in enhancing WS learning without the need for
manual labeling.
( 2
min )
We study the influence of different activation functions in the output layer
of deep neural network models for soft and hard label prediction in the
learning with disagreement task. In this task, the goal is to quantify the
amount of disagreement via predicting soft labels. To predict the soft labels,
we use BERT-based preprocessors and encoders and vary the activation function
used in the output layer, while keeping other parameters constant. The soft
labels are then used for the hard label prediction. The activation functions
considered are sigmoid as well as a step-function that is added to the model
post-training and a sinusoidal activation function, which is introduced for the
first time in this paper.
( 2
min )
Bayesian networks (BNs) are a foundational model in machine learning and
causal inference. Their graphical structure can handle high-dimensional
problems, divide them into a sparse collection of smaller ones, underlies Judea
Pearl's causality, and determines their explainability and interpretability.
Despite their popularity, there are almost no resources in the literature on
how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for
BNs under their most common distributional assumptions. In this paper, we
provide computationally efficient algorithms for both by leveraging BNs'
graphical structure, and we illustrate them with a complete set of numerical
examples. In the process, we show it is possible to reduce the computational
complexity of KL from cubic to quadratic for Gaussian BNs.
( 2
min )
Fake news detection models are critical to countering disinformation but can
be manipulated through adversarial attacks. In this position paper, we analyze
how an attacker can compromise the performance of an online learning detector
on specific news content without being able to manipulate the original target
news. In some contexts, such as social networks, where the attacker cannot
exert complete control over all the information, this scenario can indeed be
quite plausible. Therefore, we show how an attacker could potentially introduce
poisoning data into the training data to manipulate the behavior of an online
learning method. Our initial findings reveal varying susceptibility of logistic
regression models based on complexity and attack type.
( 2
min )
We present a deep learning model to automatically generate computer models of
the human heart from patient imaging data with an emphasis on its capability to
generate thin-walled cardiac structures. Our method works by deforming a
template mesh to fit the cardiac structures to the given image. Compared with
prior deep learning methods that adopted this approach, our framework is
designed to minimize mesh self-penetration, which typically arises when
deforming surface meshes separated by small distances. We achieve this by using
a two-stage diffeomorphic deformation process along with a novel loss function
derived from the kinematics of motion that penalizes surface contact and
interpenetration. Our model demonstrates comparable accuracy with
state-of-the-art methods while additionally producing meshes free of
self-intersections. The resultant meshes are readily usable in physics based
simulation, minimizing the need for post-processing and cleanup.
( 2
min )
Large language models have made significant strides in natural language
processing, enabling innovative applications in molecular science by processing
textual representations of molecules. However, most existing language models
cannot capture the rich information with complex molecular structures or
images. In this paper, we introduce GIT-Mol, a multi-modal large language model
that integrates the Graph, Image, and Text information. To facilitate the
integration of multi-modal molecular data, we propose GIT-Former, a novel
architecture that is capable of aligning all modalities into a unified latent
space. We achieve a 5%-10% accuracy increase in properties prediction and a
20.2% boost in molecule generation validity compared to the baselines. With the
any-to-language molecular translation strategy, our model has the potential to
perform more downstream tasks, such as compound name recognition and chemical
reaction prediction.
( 2
min )
The conflict between stiffness and toughness is a fundamental problem in
engineering materials design. However, the systematic discovery of
microstructured composites with optimal stiffness-toughness trade-offs has
never been demonstrated, hindered by the discrepancies between simulation and
reality and the lack of data-efficient exploration of the entire Pareto front.
We introduce a generalizable pipeline that integrates physical experiments,
numerical simulations, and artificial neural networks to address both
challenges. Without any prescribed expert knowledge of material design, our
approach implements a nested-loop proposal-validation workflow to bridge the
simulation-to-reality gap and discover microstructured composites that are
stiff and tough with high sample efficiency. Further analysis of Pareto-optimal
designs allows us to automatically identify existing toughness enhancement
mechanisms, which were previously discovered through trial-and-error or
biomimicry. On a broader scale, our method provides a blueprint for
computational design in various research areas beyond solid mechanics, such as
polymer chemistry, fluid dynamics, meteorology, and robotics.
( 2
min )
Kernel Stein discrepancies (KSDs) measure the quality of a distributional
approximation and can be computed even when the target density has an
intractable normalizing constant. Notable applications include the diagnosis of
approximate MCMC samplers and goodness-of-fit tests for unnormalized
statistical models. The present work analyzes the convergence control
properties of KSDs. We first show that standard KSDs used for weak convergence
control fail to control moment convergence. To address this limitation, we next
provide sufficient conditions under which alternative diffusion KSDs control
both moment and weak convergence. As an immediate consequence we develop, for
each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein
convergence.
( 2
min )
This paper concerns the training of a single-layer morphological perceptron
using disciplined convex-concave programming (DCCP). We introduce an algorithm
referred to as K-DDCCP, which combines the existing single-layer morphological
perceptron (SLMP) model proposed by Ritter and Urcid with the weighted
disciplined convex-concave programming (WDCCP) algorithm by Charisopoulos and
Maragos. The proposed training algorithm leverages the disciplined
convex-concave procedure (DCCP) and formulates a non-convex optimization
problem for binary classification. To tackle this problem, the constraints are
expressed as differences of convex functions, enabling the application of the
DCCP package. The experimental results confirm the effectiveness of the K-DDCCP
algorithm in solving binary classification problems. Overall, this work
contributes to the field of morphological neural networks by proposing an
algorithm that extends the capabilities of the SLMP model.
( 2
min )
Although deep learning-based algorithms have demonstrated excellent
performance in automated emotion recognition via electroencephalogram (EEG)
signals, variations across brain signal patterns of individuals can diminish
the model's effectiveness when applied across different subjects. While
transfer learning techniques have exhibited promising outcomes, they still
encounter challenges related to inadequate feature representations and may
overlook the fact that source subjects themselves can possess distinct
characteristics. In this work, we propose a multi-source domain adaptation
approach with a transformer-based feature generator (MSDA-TF) designed to
leverage information from multiple sources. The proposed feature generator
retains convolutional layers to capture shallow spatial, temporal, and spectral
EEG data representations, while self-attention mechanisms extract global
dependencies within these features. During the adaptation process, we group the
source subjects based on correlation values and aim to align the moments of the
target subject with each source as well as within the sources. MSDA-TF is
validated on the SEED dataset and is shown to yield promising results.
( 2
min )
Distributional Reinforcement Learning (RL) estimates return distribution
mainly by learning quantile values via minimizing the quantile Huber loss
function, entailing a threshold parameter often selected heuristically or via
hyperparameter search, which may not generalize well and can be suboptimal.
This paper introduces a generalized quantile Huber loss function derived from
Wasserstein distance (WD) calculation between Gaussian distributions, capturing
noise in predicted (current) and target (Bellman-updated) quantile values.
Compared to the classical quantile Huber loss, this innovative loss function
enhances robustness against outliers. Notably, the classical Huber loss
function can be seen as an approximation of our proposed loss, enabling
parameter adjustment by approximating the amount of noise in the data during
the learning process. Empirical tests on Atari games, a common application in
distributional RL, and a recent hedging strategy using distributional RL,
validate the effectiveness of our proposed loss function and its potential for
parameter adjustments in distributional RL.
( 2
min )
A critical factor in trustworthy machine learning is to develop robust
representations of the training data. Only under this guarantee methods are
legitimate to artificially generate data, for example, to counteract imbalanced
datasets or provide counterfactual explanations for blackbox decision-making
systems. In recent years, Generative Adversarial Networks (GANs) have shown
considerable results in forming stable representations and generating realistic
data. While many applications focus on generating image data, less effort has
been made in generating time series data, especially multivariate signals. In
this work, a Transformer-based autoencoder is proposed that is regularized
using an adversarial training scheme to generate artificial multivariate time
series signals. The representation is evaluated using t-SNE visualizations,
Dynamic Time Warping (DTW) and Entropy scores. Our results indicate that the
generated signals exhibit higher similarity to an exemplary dataset than using
a convolutional network approach.
( 2
min )
Pipeline parallelism is an essential technique in the training of large-scale
Transformer models. However, it suffers from imbalanced memory consumption,
leading to insufficient memory utilization. The BPipe technique was proposed to
address this issue and has proven effective in the GPT-3 model. Nevertheless,
our experiments have not yielded similar benefits for LLaMA training.
Additionally, BPipe only yields negligible benefits for GPT-3 training when
applying flash attention. We analyze the underlying causes of the divergent
performance of BPipe on GPT-3 and LLaMA. Furthermore, we introduce a novel
method to estimate the performance of BPipe.
( 2
min )
The universal approximation theorem states that a neural network with one
hidden layer can approximate continuous functions on compact sets with any
desired precision. This theorem supports using neural networks for various
applications, including regression and classification tasks. Furthermore, it is
valid for real-valued neural networks and some hypercomplex-valued neural
networks such as complex-, quaternion-, tessarine-, and Clifford-valued neural
networks. However, hypercomplex-valued neural networks are a type of
vector-valued neural network defined on an algebra with additional algebraic or
geometric properties. This paper extends the universal approximation theorem
for a wide range of vector-valued neural networks, including
hypercomplex-valued models as particular instances. Precisely, we introduce the
concept of non-degenerate algebra and state the universal approximation theorem
for neural networks defined on such algebras.
( 2
min )
We introduce a novel sampler called the energy based diffusion generator for
generating samples from arbitrary target distributions. The sampling model
employs a structure similar to a variational autoencoder, utilizing a decoder
to transform latent variables from a simple distribution into random variables
approximating the target distribution, and we design an encoder based on the
diffusion model. Leveraging the powerful modeling capacity of the diffusion
model for complex distributions, we can obtain an accurate variational estimate
of the Kullback-Leibler divergence between the distributions of the generated
samples and the target. Moreover, we propose a decoder based on generalized
Hamiltonian dynamics to further enhance sampling performance. Through empirical
evaluation, we demonstrate the effectiveness of our method across various
complex distribution functions, showcasing its superiority compared to existing
methods.
( 2
min )
This project explores adversarial training techniques to develop fairer Deep
Neural Networks (DNNs) to mitigate the inherent bias they are known to exhibit.
DNNs are susceptible to inheriting bias with respect to sensitive attributes
such as race and gender, which can lead to life-altering outcomes (e.g.,
demographic bias in facial recognition software used to arrest a suspect). We
propose a robust optimization problem, which we demonstrate can improve
fairness in several datasets, both synthetic and real-world, using an affine
linear model. Leveraging second order information, we are able to find a
solution to our optimization problem more efficiently than a purely first order
method.
( 2
min )
Bayesian networks (BNs) are a foundational model in machine learning and
causal inference. Their graphical structure can handle high-dimensional
problems, divide them into a sparse collection of smaller ones, underlies Judea
Pearl's causality, and determines their explainability and interpretability.
Despite their popularity, there are almost no resources in the literature on
how to compute Shannon's entropy and the Kullback-Leibler (KL) divergence for
BNs under their most common distributional assumptions. In this paper, we
provide computationally efficient algorithms for both by leveraging BNs'
graphical structure, and we illustrate them with a complete set of numerical
examples. In the process, we show it is possible to reduce the computational
complexity of KL from cubic to quadratic for Gaussian BNs.
( 2
min )
Kernel Stein discrepancies (KSDs) measure the quality of a distributional
approximation and can be computed even when the target density has an
intractable normalizing constant. Notable applications include the diagnosis of
approximate MCMC samplers and goodness-of-fit tests for unnormalized
statistical models. The present work analyzes the convergence control
properties of KSDs. We first show that standard KSDs used for weak convergence
control fail to control moment convergence. To address this limitation, we next
provide sufficient conditions under which alternative diffusion KSDs control
both moment and weak convergence. As an immediate consequence we develop, for
each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein
convergence.
( 2
min )
Distributional Reinforcement Learning (RL) estimates return distribution
mainly by learning quantile values via minimizing the quantile Huber loss
function, entailing a threshold parameter often selected heuristically or via
hyperparameter search, which may not generalize well and can be suboptimal.
This paper introduces a generalized quantile Huber loss function derived from
Wasserstein distance (WD) calculation between Gaussian distributions, capturing
noise in predicted (current) and target (Bellman-updated) quantile values.
Compared to the classical quantile Huber loss, this innovative loss function
enhances robustness against outliers. Notably, the classical Huber loss
function can be seen as an approximation of our proposed loss, enabling
parameter adjustment by approximating the amount of noise in the data during
the learning process. Empirical tests on Atari games, a common application in
distributional RL, and a recent hedging strategy using distributional RL,
validate the effectiveness of our proposed loss function and its potential for
parameter adjustments in distributional RL.
( 2
min )
The Ising model is important in statistical modeling and inference in many
applications, however its normalizing constant, mean number of active vertices
and mean spin interaction -- quantities needed in inference -- are
computationally intractable. We provide accurate approximations that make it
possible to numerically calculate these quantities in the homogeneous case.
Simulation studies indicate good performance of our approximation formulae that
are scalable and unfazed by the size (number of nodes, degree of graph) of the
Markov Random Field. The practical import of our approximation formulae is
illustrated in performing Bayesian inference in a functional Magnetic Resonance
Imaging activation detection experiment, and also in likelihood ratio testing
for anisotropy in the spatial patterns of yearly increases in pistachio tree
yields.
( 2
min )
We introduce a novel sampler called the energy based diffusion generator for
generating samples from arbitrary target distributions. The sampling model
employs a structure similar to a variational autoencoder, utilizing a decoder
to transform latent variables from a simple distribution into random variables
approximating the target distribution, and we design an encoder based on the
diffusion model. Leveraging the powerful modeling capacity of the diffusion
model for complex distributions, we can obtain an accurate variational estimate
of the Kullback-Leibler divergence between the distributions of the generated
samples and the target. Moreover, we propose a decoder based on generalized
Hamiltonian dynamics to further enhance sampling performance. Through empirical
evaluation, we demonstrate the effectiveness of our method across various
complex distribution functions, showcasing its superiority compared to existing
methods.
( 2
min )
In this research, we investigate the structural evolution of the cosmic web,
employing advanced methodologies from Topological Data Analysis. Our approach
involves leveraging $Persistence$ $Signals$, an innovative method from recent
literature that facilitates the embedding of persistence diagrams into vector
spaces by re-conceptualizing them as signals in $\mathbb R^2_+$. Utilizing this
methodology, we analyze three quintessential cosmic structures: clusters,
filaments, and voids. A central discovery is the correlation between
$Persistence$ $Energy$ and redshift values, linking persistent homology with
cosmic evolution and providing insights into the dynamics of cosmic structures.
( 2
min )
This post was written in collaboration with Bhajandeep Singh and Ajay Vishwakarma from Wipro’s AWS AI/ML Practice. Many organizations have been using a combination of on-premises and open source data science solutions to create and manage machine learning (ML) models. Data science and DevOps teams may face challenges managing these isolated tool stacks and systems. […]
( 13
min )
Deep generative models have been demonstrated as problematic in the
unsupervised out-of-distribution (OOD) detection task, where they tend to
assign higher likelihoods to OOD samples. Previous studies on this issue are
usually not applicable to the Variational Autoencoder (VAE). As a popular
subclass of generative models, the VAE can be effective with a relatively
smaller model size and be more stable and faster in training and inference,
which can be more advantageous in real-world applications. In this paper, We
propose a novel VAE-based score called Error Reduction (ER) for OOD detection,
which is based on a VAE that takes a lossy version of the training set as
inputs and the original set as targets. Experiments are carried out on various
datasets to show the effectiveness of our method, we also present the effect of
design choices with ablation experiments. Our code is available at:
https://github.com/ZJLAB-AMMI/VAE4OOD.
( 2
min )
Leakages are a major risk in water distribution networks as they cause water
loss and increase contamination risks. Leakage detection is a difficult task
due to the complex dynamics of water distribution networks. In particular,
small leakages are hard to detect. From a machine-learning perspective,
leakages can be modeled as concept drift. Thus, a wide variety of drift
detection schemes seems to be a suitable choice for detecting leakages. In this
work, we explore the potential of model-loss-based and distribution-based drift
detection methods to tackle leakage detection. We additionally discuss the
issue of temporal dependencies in the data and propose a way to cope with it
when applying distribution-based detection. We evaluate different methods
systematically for leakages of different sizes and detection times.
Additionally, we propose a first drift-detection-based technique for localizing
leakages.
( 2
min )
In this paper, we investigate on improving the adversarial robustness
obtained in adversarial training (AT) via reducing the difficulty of
optimization. To better study this problem, we build a novel Bregman divergence
perspective for AT, in which AT can be viewed as the sliding process of the
training data points on the negative entropy curve. Based on this perspective,
we analyze the learning objectives of two typical AT methods, i.e., PGD-AT and
TRADES, and we find that the optimization process of TRADES is easier than
PGD-AT for that TRADES separates PGD-AT. In addition, we discuss the function
of entropy in TRADES, and we find that models with high entropy can be better
robustness learners. Inspired by the above findings, we propose two methods,
i.e., FAIT and MER, which can both not only reduce the difficulty of
optimization under the 10-step PGD adversaries, but also provide better
robustness. Our work suggests that reducing the difficulty of optimization
under the 10-step PGD adversaries is a promising approach for enhancing the
adversarial robustness in AT.
( 2
min )
In many recent works, there is an increased focus on designing algorithms
that seek flatter optima for neural network loss optimization as there is
empirical evidence that it leads to better generalization performance in many
datasets. In this work, we dissect these performance gains through the lens of
data memorization in overparameterized models. We define a new metric that
helps us identify which data points specifically do algorithms seeking flatter
optima do better when compared to vanilla SGD. We find that the generalization
gains achieved by Sharpness Aware Minimization (SAM) are particularly
pronounced for atypical data points, which necessitate memorization. This
insight helps us unearth higher privacy risks associated with SAM, which we
verify through exhaustive empirical evaluations. Finally, we propose mitigation
strategies to achieve a more desirable accuracy vs privacy tradeoff.
( 2
min )
Neural algorithmic reasoners are parallel processors. Teaching them
sequential algorithms contradicts this nature, rendering a significant share of
their computations redundant. Parallel algorithms however may exploit their
full computational power, therefore requiring fewer layers to be executed. This
drastically reduces training times, as we observe when comparing parallel
implementations of searching, sorting and finding strongly connected components
to their sequential counterparts on the CLRS framework. Additionally, parallel
versions achieve (often strongly) superior predictive performance.
( 2
min )
We introduce ensembles of stochastic neural networks to approximate the
Bayesian posterior, combining stochastic methods such as dropout with deep
ensembles. The stochastic ensembles are formulated as families of distributions
and trained to approximate the Bayesian posterior with variational inference.
We implement stochastic ensembles based on Monte Carlo dropout, DropConnect and
a novel non-parametric version of dropout and evaluate them on a toy problem
and CIFAR image classification. For both tasks, we test the quality of the
posteriors directly against Hamiltonian Monte Carlo simulations. Our results
show that stochastic ensembles provide more accurate posterior estimates than
other popular baselines for Bayesian inference.
( 2
min )
Identifying reaction coordinates(RCs) is an active area of research, given
the crucial role RCs play in determining the progress of a chemical reaction.
The choice of the reaction coordinate is often based on heuristic knowledge.
However, an essential criterion for the choice is that the coordinate should
capture both the reactant and product states unequivocally. Also, the
coordinate should be the slowest one so that all the other degrees of freedom
can easily equilibrate along the reaction coordinate. Also, the coordinate
should be the slowest one so that all the other degrees of freedom can easily
equilibrate along the reaction coordinate. We used a regularised sparse
autoencoder, an energy-based model, to discover a crucial set of reaction
coordinates. Along with discovering reaction coordinates, our model also
predicts the evolution of a molecular dynamics(MD) trajectory. We showcased
that including sparsity enforcing regularisation helps in choosing a small but
important set of reaction coordinates. We used two model systems to demonstrate
our approach: alanine dipeptide system and proflavine and DNA system, which
exhibited intercalation of proflavine into DNA minor groove in an aqueous
environment. We model MD trajectory as a multivariate time series, and our
latent variable model performs the task of multi-step time series prediction.
This idea is inspired by the popular sparse coding approach - to represent each
input sample as a linear combination of few elements taken from a set of
representative patterns.
( 3
min )
Artificial intelligence (AI), machine learning, and deep learning (DL)
methods are becoming increasingly important in the field of biomedical image
analysis. However, to exploit the full potential of such methods, a
representative number of experimentally acquired images containing a
significant number of manually annotated objects is needed as training data.
Here we introduce SYNTA (synthetic data) as a novel approach for the generation
of synthetic, photo-realistic, and highly complex biomedical images as training
data for DL systems. We show the versatility of our approach in the context of
muscle fiber and connective tissue analysis in histological sections. We
demonstrate that it is possible to perform robust and expert-level segmentation
tasks on previously unseen real-world data, without the need for manual
annotations using synthetic training data alone. Being a fully parametric
technique, our approach poses an interpretable and controllable alternative to
Generative Adversarial Networks (GANs) and has the potential to significantly
accelerate quantitative image analysis in a variety of biomedical applications
in microscopy and beyond.
( 3
min )
We present a new method that includes three key components of distributed
optimization and federated learning: variance reduction of stochastic
gradients, partial participation, and compressed communication. We prove that
the new method has optimal oracle complexity and state-of-the-art communication
complexity in the partial participation setting. Regardless of the
communication compression feature, our method successfully combines variance
reduction and partial participation: we get the optimal oracle complexity,
never need the participation of all nodes, and do not require the bounded
gradients (dissimilarity) assumption.
( 2
min )
This document outlines some of the common mistakes that occur when using
machine learning, and what can be done to avoid them. Whilst it should be
accessible to anyone with a basic understanding of machine learning techniques,
it was originally written for research students, and focuses on issues that are
of particular concern within academic research, such as the need to do rigorous
comparisons and reach valid conclusions. It covers five stages of the machine
learning process: what to do before model building, how to reliably build
models, how to robustly evaluate models, how to compare models fairly, and how
to report results.
( 2
min )
A simple and effective method for the alignment of generative models is the
best-of-$n$ policy, where $n$ samples are drawn from a base policy, and ranked
based on a reward function, and the highest ranking one is selected. A commonly
used analytical expression in the literature claims that the KL divergence
between the best-of-$n$ policy and the base policy is equal to $\log (n) -
(n-1)/n.$ We disprove the validity of this claim, and show that it is an upper
bound on the actual KL divergence. We also explore the tightness of this upper
bound in different regimes. Finally, we propose a new estimator for the KL
divergence and empirically show that it provides a tight approximation through
a few examples.
( 2
min )
This work examines the effects of variations in machine learning training
regimes and learning paradigms on the corresponding energy consumption. While
increasing data availability and innovation in high-performance hardware fuels
the training of sophisticated models, it also supports the fading perception of
energy consumption and carbon emission. Therefore, the goal of this work is to
create awareness about the energy impact of general training parameters and
processes, from learning rate over batch size to knowledge transfer. Multiple
setups with different hyperparameter initializations are evaluated on two
different hardware configurations to obtain meaningful results. Experiments on
pretraining and multitask training are conducted on top of the baseline results
to determine their potential towards sustainable machine learning.
( 2
min )
Data augmentation is an effective technique for improving the performance of
machine learning models. However, it has not been explored as extensively in
natural language processing (NLP) as it has in computer vision. In this paper,
we propose a novel text augmentation method that leverages the Fill-Mask
feature of the transformer-based BERT model. Our method involves iteratively
masking words in a sentence and replacing them with language model predictions.
We have tested our proposed method on various NLP tasks and found it to be
effective in many cases. Our results are presented along with a comparison to
existing augmentation methods. Experimental results show that our proposed
method significantly improves performance, especially on topic classification
datasets.
( 2
min )
According to the World Health Organization (WHO), air pollution kills seven
million people every year. Outdoor air pollution is a major environmental
health problem affecting low, middle, and high-income countries. In the past
few years, the research community has explored IoT-enabled machine learning
applications for outdoor air pollution prediction. The general objective of
this paper is to systematically review applications of machine learning and
Internet of Things (IoT) for outdoor air pollution prediction and the
combination of monitoring sensors and input features used. Two research
questions were formulated for this review. 1086 publications were collected in
the initial PRISMA stage. After the screening and eligibility phases, 37 papers
were selected for inclusion. A cost-based analysis was conducted on the
findings to highlight high-cost monitoring, low-cost IoT and hybrid enabled
prediction. Three methods of prediction were identified: time series,
feature-based and spatio-temporal. This review's findings identify major
limitations in applications found in the literature, namely lack of coverage,
lack of diversity of data and lack of inclusion of context-specific features.
This review proposes directions for future research and underlines practical
implications in healthcare, urban planning, global synergy and smart cities.
( 2
min )
Explainability in deep networks has gained increased importance in recent
years. We argue herein that an AI must be tasked not just with a task but also
with an explanation of why said task was accomplished as such. We present a
basic framework -- Task and Explanation Network (TENet) -- which fully
integrates task completion and its explanation. We believe that the field of AI
as a whole should insist -- quite emphatically -- on explainability.
( 2
min )
The generalization error curve of certain kernel regression method aims at
determining the exact order of generalization error with various source
condition, noise level and choice of the regularization parameter rather than
the minimax rate. In this work, under mild assumptions, we rigorously provide a
full characterization of the generalization error curves of the kernel gradient
descent method (and a large class of analytic spectral algorithms) in kernel
regression. Consequently, we could sharpen the near inconsistency of kernel
interpolation and clarify the saturation effects of kernel regression
algorithms with higher qualification, etc. Thanks to the neural tangent kernel
theory, these results greatly improve our understanding of the generalization
behavior of training the wide neural networks. A novel technical contribution,
the analytic functional argument, might be of independent interest.
( 2
min )
When implementing hierarchical federated learning over wireless networks,
scalability assurance and the ability to handle both interference and device
data heterogeneity are crucial. This work introduces a learning method designed
to address these challenges, along with a scalable transmission scheme that
efficiently uses a single wireless resource through over-the-air computation.
To provide resistance against data heterogeneity, we employ gradient
aggregations. Meanwhile, the impact of interference is minimized through
optimized receiver normalizing factors. For this, we model a multi-cluster
wireless network using stochastic geometry, and characterize the mean squared
error of the aggregation estimations as a function of the network parameters.
We show that despite the interference and the data heterogeneity, the proposed
scheme achieves high learning accuracy and can significantly outperform the
conventional hierarchical algorithm.
( 2
min )
In the rapidly evolving field of artificial intelligence, the creation and
utilization of synthetic datasets have become increasingly significant. This
report delves into the multifaceted aspects of synthetic data, particularly
emphasizing the challenges and potential biases these datasets may harbor. It
explores the methodologies behind synthetic data generation, spanning
traditional statistical models to advanced deep learning techniques, and
examines their applications across diverse domains. The report also critically
addresses the ethical considerations and legal implications associated with
synthetic datasets, highlighting the urgent need for mechanisms to ensure
fairness, mitigate biases, and uphold ethical standards in AI development.
( 2
min )
The last decades have been characterized by unprecedented technological
advances, many of them powered by modern technologies such as Artificial
Intelligence (AI) and Machine Learning (ML). The world has become more
digitally connected than ever, but we face major challenges. One of the most
significant is cybercrime, which has emerged as a global threat to governments,
businesses, and civil societies. The pervasiveness of digital technologies
combined with a constantly shifting technological foundation has created a
complex and powerful playground for cybercriminals, which triggered a surge in
demand for intelligent threat detection systems based on machine and deep
learning. This paper investigates AI-based cyber threat detection to protect
our modern digital ecosystems. The primary focus is on evaluating ML-based
classifiers and ensembles for anomaly-based malware detection and network
intrusion detection and how to integrate those models in the context of network
security, mobile security, and IoT security. The discussion highlights the
challenges when deploying and integrating AI-enabled cybersecurity solutions
into existing enterprise systems and IT infrastructures, including options to
overcome those challenges. Finally, the paper provides future research
directions to further increase the security and resilience of our modern
digital industries, infrastructures, and ecosystems.
( 2
min )
Existing object recognition models have been shown to lack robustness in
diverse geographical scenarios due to significant domain shifts in design and
context. Class representations need to be adapted to more accurately reflect an
object concept under these shifts. In the absence of training data from target
geographies, we hypothesize that geography-specific descriptive knowledge of
object categories can be leveraged to enhance robustness. For this purpose, we
explore the feasibility of probing a large-language model for
geography-specific object knowledge, and we investigate integrating knowledge
in zero-shot and learnable soft prompting with the CLIP vision-language model.
In particular, we propose a geography knowledge regularization method to ensure
that soft prompts trained on a source set of geographies generalize to an
unseen target set of geographies. Our gains on DollarStreet when generalizing
from a model trained only on data from Europe are as large as +2.8 on countries
from Africa, and +4.6 on the hardest classes. We further show competitive
performance vs. few-shot target training, and provide insights into how
descriptive knowledge captures geographical differences.
( 2
min )
Defect detection is one of the most important yet challenging tasks in the
quality control stage in the manufacturing sector. In this work, we introduce a
Tensor Convolutional Neural Network (T-CNN) and examine its performance on a
real defect detection application in one of the components of the ultrasonic
sensors produced at Robert Bosch's manufacturing plants. Our quantum-inspired
T-CNN operates on a reduced model parameter space to substantially improve the
training speed and performance of an equivalent CNN model without sacrificing
accuracy. More specifically, we demonstrate how T-CNNs are able to reach the
same performance as classical CNNs as measured by quality metrics, with up to
fifteen times fewer parameters and 4% to 19% faster training times. Our results
demonstrate that the T-CNN greatly outperforms the results of traditional human
visual inspection, providing value in a current real application in
manufacturing.
( 2
min )
Recent research has shown the potential of deep learning in multi-parametric
MRI-based visual pathway (VP) segmentation. However, obtaining labeled data for
training is laborious and time-consuming. Therefore, it is crucial to develop
effective algorithms in situations with limited labeled samples. In this work,
we propose a label-efficient deep learning method with self-ensembling (LESEN).
LESEN incorporates supervised and unsupervised losses, enabling the student and
teacher models to mutually learn from each other, forming a self-ensembling
mean teacher framework. Additionally, we introduce a reliable unlabeled sample
selection (RUSS) mechanism to further enhance LESEN's effectiveness. Our
experiments on the human connectome project (HCP) dataset demonstrate the
superior performance of our method when compared to state-of-the-art
techniques, advancing multimodal VP segmentation for comprehensive analysis in
clinical and research settings. The implementation code will be available at:
https://github.com/aldiak/Semi-Supervised-Multimodal-Visual-Pathway-
Delineation.
( 2
min )
A new variant of Newton's method - named Backtracking New Q-Newton's method
(BNQN) - which has strong theoretical guarantee, is easy to implement, and has
good experimental performance, was recently introduced by the third author.
Experiments performed previously showed some remarkable properties of the
basins of attractions for finding roots of polynomials and meromorphic
functions, with BNQN. In general, they look more smooth than that of Newton's
method.
In this paper, we continue to experimentally explore in depth this remarkable
phenomenon, and connect BNQN to Newton's flow and Voronoi's diagram. This link
poses a couple of challenging puzzles to be explained. Experiments also
indicate that BNQN is more robust against random perturbations than Newton's
method and Random Relaxed Newton's method.
( 2
min )
Machine learning models underpin many modern financial systems for use cases
such as fraud detection and churn prediction. Most are based on supervised
learning with hand-engineered features, which relies heavily on the
availability of labelled data. Large self-supervised generative models have
shown tremendous success in natural language processing and computer vision,
yet so far they haven't been adapted to multivariate time series of financial
transactions. In this paper, we present a generative pretraining method that
can be used to obtain contextualised embeddings of financial transactions.
Benchmarks on public datasets demonstrate that it outperforms state-of-the-art
self-supervised methods on a range of downstream tasks. We additionally perform
large-scale pretraining of an embedding model using a corpus of data from 180
issuing banks containing 5.1 billion transactions and apply it to the card
fraud detection problem on hold-out datasets. The embedding model significantly
improves value detection rate at high precision thresholds and transfers well
to out-of-domain distributions.
( 2
min )
We introduce ensembles of stochastic neural networks to approximate the
Bayesian posterior, combining stochastic methods such as dropout with deep
ensembles. The stochastic ensembles are formulated as families of distributions
and trained to approximate the Bayesian posterior with variational inference.
We implement stochastic ensembles based on Monte Carlo dropout, DropConnect and
a novel non-parametric version of dropout and evaluate them on a toy problem
and CIFAR image classification. For both tasks, we test the quality of the
posteriors directly against Hamiltonian Monte Carlo simulations. Our results
show that stochastic ensembles provide more accurate posterior estimates than
other popular baselines for Bayesian inference.
( 2
min )
Identifying reaction coordinates(RCs) is an active area of research, given
the crucial role RCs play in determining the progress of a chemical reaction.
The choice of the reaction coordinate is often based on heuristic knowledge.
However, an essential criterion for the choice is that the coordinate should
capture both the reactant and product states unequivocally. Also, the
coordinate should be the slowest one so that all the other degrees of freedom
can easily equilibrate along the reaction coordinate. Also, the coordinate
should be the slowest one so that all the other degrees of freedom can easily
equilibrate along the reaction coordinate. We used a regularised sparse
autoencoder, an energy-based model, to discover a crucial set of reaction
coordinates. Along with discovering reaction coordinates, our model also
predicts the evolution of a molecular dynamics(MD) trajectory. We showcased
that including sparsity enforcing regularisation helps in choosing a small but
important set of reaction coordinates. We used two model systems to demonstrate
our approach: alanine dipeptide system and proflavine and DNA system, which
exhibited intercalation of proflavine into DNA minor groove in an aqueous
environment. We model MD trajectory as a multivariate time series, and our
latent variable model performs the task of multi-step time series prediction.
This idea is inspired by the popular sparse coding approach - to represent each
input sample as a linear combination of few elements taken from a set of
representative patterns.
( 3
min )
Generative AI has opened up a lot of potential in the field of AI. We are seeing numerous uses, including text generation, code generation, summarization, translation, chatbots, and more. One such area that is evolving is using natural language processing (NLP) to unlock new opportunities for accessing data through intuitive SQL queries. Instead of dealing […]
( 10
min )
Expanded LLM use creates new demands on cloud GPU capacity. Splitwise presents an efficient solution by separating the two essential phases of LLM inference, achieving higher throughput within a limited power budget.
The post Splitwise improves GPU usage by splitting LLM inference phases appeared first on Microsoft Research.
( 10
min )
Celebrate the new year with more cloud gaming. Experience the power and performance of the cloud with more than 20 new games to be added to GeForce NOW in January. Start with five games available this week, including The Finals from Embark Studios. And tune in to the NVIDIA Special Address at CES on Monday, Read article >
( 7
min )
Finding a transformation between two unknown probability distributions from
finite samples is crucial for modeling complex data distributions and
performing tasks such as sample generation, domain adaptation and statistical
inference. One powerful framework for such transformations is normalizing flow,
which transforms an unknown distribution into a standard normal distribution
using an invertible network. In this paper, we introduce a novel model called
SyMOT-Flow that trains an invertible transformation by minimizing the symmetric
maximum mean discrepancy between samples from two unknown distributions, and an
optimal transport cost is incorporated as regularization to obtain a
short-distance and interpretable transformation. The resulted transformation
leads to more stable and accurate sample generation. Several theoretical
results are established for the proposed model and its effectiveness is
validated with low-dimensional illustrative examples as well as
high-dimensional bi-modality medical image generation through the forward and
reverse flows.
( 2
min )
Identifying constitutive parameters in engineering and biological materials,
particularly those with intricate geometries and mechanical behaviors, remains
a longstanding challenge. The recent advent of Physics-Informed Neural Networks
(PINNs) offers promising solutions, but current frameworks are often limited to
basic constitutive laws and encounter practical constraints when combined with
experimental data. In this paper, we introduce a robust PINN-based framework
designed to identify material parameters for soft materials, specifically those
exhibiting complex constitutive behaviors, under large deformation in plane
stress conditions. Distinctively, our model emphasizes training PINNs with
multi-modal synthetic experimental datasets consisting of full-field
deformation and loading history, ensuring algorithm robustness even with noisy
data. Our results reveal that the PINNs framework can accurately identify
constitutive parameters of the incompressible Arruda-Boyce model for samples
with intricate geometries, maintaining an error below 5%, even with an
experimental noise level of 5%. We believe our framework provides a robust
modulus identification approach for complex solids, especially for those with
geometrical and constitutive complexity.
( 2
min )
Causal inference is a crucial goal of science, enabling researchers to arrive
at meaningful conclusions regarding the predictions of hypothetical
interventions using observational data. Path models, Structural Equation Models
(SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to
unambiguously specify assumptions regarding the causal structure underlying a
phenomenon. Unlike DAGs, which make very few assumptions about the functional
and parametric form, SEM assumes linearity. This can result in functional
misspecification which prevents researchers from undertaking reliable effect
size estimation. In contrast, we propose Super Learner Equation Modeling, a
path modeling technique integrating machine learning Super Learner ensembles.
We empirically demonstrate its ability to provide consistent and unbiased
estimates of causal effects, its competitive performance for linear models when
compared with SEM, and highlight its superiority over SEM when dealing with
non-linear relationships. We provide open-source code, and a tutorial notebook
with example usage, accentuating the easy-to-use nature of the method.
( 2
min )
How do language models "think"? This paper formulates a probabilistic
cognitive model called the bounded pragmatic speaker, which can characterize
the operation of different variations of language models. Specifically, we
demonstrate that large language models fine-tuned with reinforcement learning
from human feedback (Ouyang et al., 2022) embody a model of thought that
conceptually resembles a fast-and-slow model (Kahneman, 2011), which
psychologists have attributed to humans. We discuss the limitations of
reinforcement learning from human feedback as a fast-and-slow model of thought
and propose avenues for expanding this framework. In essence, our research
highlights the value of adopting a cognitive probabilistic modeling approach to
gain insights into the comprehension, evaluation, and advancement of language
models.
( 2
min )
Pseudo-Hamiltonian neural networks (PHNN) were recently introduced for
learning dynamical systems that can be modelled by ordinary differential
equations. In this paper, we extend the method to partial differential
equations. The resulting model is comprised of up to three neural networks,
modelling terms representing conservation, dissipation and external forces, and
discrete convolution operators that can either be learned or be given as input.
We demonstrate numerically the superior performance of PHNN compared to a
baseline model that models the full dynamics by a single neural network.
Moreover, since the PHNN model consists of three parts with different physical
interpretations, these can be studied separately to gain insight into the
system, and the learned model is applicable also if external forces are removed
or changed.
( 2
min )
In this paper, we propose a novel method for joint entity and relation
extraction from unstructured text by framing it as a conditional sequence
generation problem. In contrast to conventional generative information
extraction models that are left-to-right token-level generators, our approach
is \textit{span-based}. It generates a linearized graph where nodes represent
text spans and edges represent relation triplets. Our method employs a
transformer encoder-decoder architecture with pointing mechanism on a dynamic
vocabulary of spans and relation types. Our model can capture the structural
characteristics and boundaries of entities and relations through span
representations while simultaneously grounding the generated output in the
original text thanks to the pointing mechanism. Evaluation on benchmark
datasets validates the effectiveness of our approach, demonstrating competitive
results. Code is available at https://github.com/urchade/ATG.
( 2
min )
In this work we present deep learning implementations of two popular
theoretical constrained optimization algorithms in infinite dimensional Hilbert
spaces, namely, the penalty and the augmented Lagrangian methods. We test these
algorithms on some toy problems originating in either calculus of variations or
physics. We demonstrate that both methods are able to produce decent
approximations for the test problems and are comparable in terms of different
errors. Leveraging the common occurrence of the Lagrange multiplier update rule
being computationally less expensive than solving subproblems in the penalty
method, we achieve significant speedups in cases when the output of the
constraint function is itself a function.
( 2
min )
Broadband infrastructure owners do not always know how their customers are
connected in the local networks, which are structured as rooted trees. A recent
study is able to infer the topology of a local network using discrete time
series data from the leaves of the tree (customers). In this study we propose a
contrastive approach for learning a binary event encoder from continuous time
series data. As a preliminary result, we show that our approach has some
potential in learning a valuable encoder.
( 2
min )
This paper introduces HAAQI-Net, a non-intrusive deep learning model for
music quality assessment tailored to hearing aid users. In contrast to
traditional methods like the Hearing Aid Audio Quality Index (HAAQI), HAAQI-Net
utilizes a Bidirectional Long Short-Term Memory (BLSTM) with attention. It
takes an assessed music sample and a hearing loss pattern as input, generating
a predicted HAAQI score. The model employs the pre-trained Bidirectional
Encoder representation from Audio Transformers (BEATs) for acoustic feature
extraction. Comparing predicted scores with ground truth, HAAQI-Net achieves a
Longitudinal Concordance Correlation (LCC) of 0.9257, Spearman's Rank
Correlation Coefficient (SRCC) of 0.9394, and Mean Squared Error (MSE) of
0.0080. Notably, this high performance comes with a substantial reduction in
inference time: from 62.52 seconds (by HAAQI) to 2.71 seconds (by HAAQI-Net),
serving as an efficient music quality assessment model for hearing aid users.
( 2
min )
Modern healthcare often utilises radiographic images alongside textual
reports for diagnostics, encouraging the use of Vision-Language Self-Supervised
Learning (VL-SSL) with large pre-trained models to learn versatile medical
vision representations. However, most existing VL-SSL frameworks are trained
end-to-end, which is computation-heavy and can lose vital prior information
embedded in pre-trained encoders. To address both issues, we introduce the
backbone-agnostic Adaptor framework, which preserves medical knowledge in
pre-trained image and text encoders by keeping them frozen, and employs a
lightweight Adaptor module for cross-modal learning. Experiments on medical
image classification and segmentation tasks across three datasets reveal that
our framework delivers competitive performance while cutting trainable
parameters by over 90% compared to current pre-training approaches. Notably,
when fine-tuned with just 1% of data, Adaptor outperforms several
Transformer-based methods trained on full datasets in medical image
segmentation.
( 2
min )
We present a new high-probability PAC-Bayes oracle bound for unbounded
losses. This result can be understood as a PAC-Bayes version of the Chernoff
bound. The proof technique relies on uniformly bounding the tail of certain
random variable based on the Cram\'er transform of the loss. We highlight two
applications of our main result. First, we show that our bound solves the open
problem of optimizing the free parameter on many PAC-Bayes bounds. Finally, we
show that our approach allows working with flexible assumptions on the loss
function, resulting in novel bounds that generalize previous ones and can be
minimized to obtain Gibbs-like posteriors.
( 2
min )
We present a fast and high-quality codec language model for parallel audio
generation. While SoundStorm, a state-of-the-art parallel audio generation
model, accelerates inference speed compared to autoregressive models, it still
suffers from slow inference due to iterative sampling. To resolve this problem,
we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel
Decoding~(G-IPD) for efficient parallel audio generation. Both the training and
sampling schemes enable the model to synthesize high-quality audio with a small
number of iterations by effectively modeling the group-wise conditional
dependencies. In addition, our model employs a cross-attention-based
architecture to capture the speaker style of the prompt voice and improves
computational efficiency. Experimental results demonstrate that our proposed
model outperforms the baselines in prompt-based audio generation.
( 2
min )
The prediction of rolling bearing lifespan is of significant importance in
industrial production. However, the scarcity of high-quality, full lifecycle
data has been a major constraint in achieving precise predictions. To address
this challenge, this paper introduces the CVGAN model, a novel framework
capable of generating one-dimensional vibration signals in both horizontal and
vertical directions, conditioned on historical vibration data and remaining
useful life. In addition, we propose an autoregressive generation method that
can iteratively utilize previously generated vibration information to guide the
generation of current signals. The effectiveness of the CVGAN model is
validated through experiments conducted on the PHM 2012 dataset. Our findings
demonstrate that the CVGAN model, in terms of both MMD and FID metrics,
outperforms many advanced methods in both autoregressive and non-autoregressive
generation modes. Notably, training using the full lifecycle data generated by
the CVGAN model significantly improves the performance of the predictive model.
This result highlights the effectiveness of the data generated by CVGans in
enhancing the predictive power of these models.
( 2
min )
Natural policy gradient (NPG) and its variants are widely-used policy search
methods in reinforcement learning. Inspired by prior work, a new NPG variant
coined NPG-HM is developed in this paper, which utilizes the Hessian-aided
momentum technique for variance reduction, while the sub-problem is solved via
the stochastic gradient descent method. It is shown that NPG-HM can achieve the
global last iterate $\epsilon$-optimality with a sample complexity of
$\mathcal{O}(\epsilon^{-2})$, which is the best known result for natural policy
gradient type methods under the generic Fisher non-degenerate policy
parameterizations. The convergence analysis is built upon a relaxed weak
gradient dominance property tailored for NPG under the compatible function
approximation framework, as well as a neat way to decompose the error when
handling the sub-problem. Moreover, numerical experiments on Mujoco-based
environments demonstrate the superior performance of NPG-HM over other
state-of-the-art policy gradient methods.
( 2
min )
Exploring methods and techniques of machine learning (ML) to address specific
challenges in various fields is essential. In this work, we tackle a problem in
the domain of Cheminformatics; that is, providing a suitable solution to aid in
predicting the activity of a chemical compound to the best extent possible. To
address the problem at hand, this study conducts experiments on 100 different
combinations of existing techniques. These solutions are then selected based on
a set of criteria that includes the G-means, F1-score, and AUC metrics. The
results have been tested on a dataset of about 10,000 chemical compounds from
PubChem that have been classified according to their activity
( 2
min )
Suicide is recognized as one of the most serious concerns in the modern
society. Suicide causes tragedy that affects countries, communities, and
families. There are many factors that lead to suicidal ideations. Early
detection of suicidal ideations can help to prevent suicide occurrence by
providing the victim with the required professional support, especially when
the victim does not recognize the danger of having suicidal ideations. As
technology usage has increased, people share and express their ideations
digitally via social media, chatbots, and other digital platforms. In this
paper, we proposed a novel, simple deep learning-based model to detect suicidal
ideations in digital content, mainly focusing on chatbots as the primary data
source. In addition, we provide a framework that employs the proposed suicide
detection integration with a chatbot-based support system.
( 2
min )
The primary goal of this project is to develop privacy-preserving machine
learning model training techniques for fNIRS data. This project will build a
local model in a centralized setting with both differential privacy (DP) and
certified robustness. It will also explore collaborative federated learning to
train a shared model between multiple clients without sharing local fNIRS
datasets. To prevent unintentional private information leakage of such clients'
private datasets, we will also implement DP in the federated learning setting.
( 2
min )
Exploring generative model training for synthetic tabular data, specifically
in sequential contexts such as credit card transaction data, presents
significant challenges. This paper addresses these challenges, focusing on
attaining both high fidelity to actual data and optimal utility for machine
learning tasks. We introduce five pre-processing schemas to enhance the
training of the Conditional Probabilistic Auto-Regressive Model (CPAR),
demonstrating incremental improvements in the synthetic data's fidelity and
utility. Upon achieving satisfactory fidelity levels, our attention shifts to
training fraud detection models tailored for time-series data, evaluating the
utility of the synthetic data. Our findings offer valuable insights and
practical guidelines for synthetic data practitioners in the finance sector,
transitioning from real to synthetic datasets for training purposes, and
illuminating broader methodologies for synthesizing credit card transaction
time series.
( 2
min )
In this paper, we present an unsupervised approach for frequency sub-band
allocation in wireless networks using graph-based learning. We consider a dense
deployment of subnetworks in the factory environment with a limited number of
sub-bands which must be optimally allocated to coordinate inter-subnetwork
interference. We model the subnetwork deployment as a conflict graph and
propose an unsupervised learning approach inspired by the graph colouring
heuristic and the Potts model to optimize the sub-band allocation using graph
neural networks. The numerical evaluation shows that the proposed method
achieves close performance to the centralized greedy colouring sub-band
allocation heuristic with lower computational time complexity. In addition, it
incurs reduced signalling overhead compared to iterative optimization
heuristics that require all the mutual interfering channel information. We
further demonstrate that the method is robust to different network settings.
( 2
min )
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an
XGBoost binary classification model predicting the Intensive Care Unit (ICU)
length of stay (LOS). Highlighting the critical role of the ICU in managing
critically ill patients, the study addresses the growing strain on ICU
capacity. It emphasizes the significance of LOS prediction for resource
allocation. The research reveals class imbalances in the dataset across
demographic attributes and employs data preprocessing and feature extraction.
While the XGBoost model performs well overall, disparities across race and
insurance attributes reflect the need for tailored assessments and continuous
monitoring. The paper concludes with recommendations for fairness-aware machine
learning techniques for mitigating biases and the need for collaborative
efforts among healthcare professionals and data scientists.
( 2
min )
Leukemia is one of the most common and death-threatening types of cancer that
threaten human life. Medical data from some of the patient's critical
parameters contain valuable information hidden among these data. On this
subject, deep learning can be used to extract this information. In this paper,
AutoEncoders have been used to develop valuable features to help the precision
of leukemia diagnosis. It has been attempted to get the best activation
function and optimizer to use in AutoEncoder and designed the best architecture
for this neural network. The proposed architecture is compared with this area's
classical machine learning models. Our proposed method performs better than
other machine learning in precision and f1-score metrics by more than 11%.
( 2
min )
Reservoir computing is a machine learning technique which has been shown to
be able to replicate the chaotic attractor, including the fractal dimension and
the entire Lyapunov spectrum, of the dynamical system on which it is trained.
We quantitatively relate the generalized synchronization dynamics of a driven
reservoir computer during the training stage to the performance of the
autonomous reservoir computer at the attractor reconstruction task. We show
that, for successful attractor reconstruction and Lyapunov exponent estimation,
the largest conditional Lyapunov exponent of the driven reservoir must be
significantly smaller (more negative) than the smallest (most negative)
Lyapunov exponent of the true system. We find that the maximal conditional
Lyapunov exponent of the reservoir depends strongly on the spectral radius of
the reservoir adjacency matrix, and therefore, for attractor reconstruction and
Lyapunov exponent estimation, small spectral radius reservoir computers perform
better in general. Our arguments are supported by numerical examples on
well-known chaotic systems.
( 2
min )
In this paper we show how tensor networks help in developing explainability
of machine learning algorithms. Specifically, we develop an unsupervised
clustering algorithm based on Matrix Product States (MPS) and apply it in the
context of a real use-case of adversary-generated threat intelligence. Our
investigation proves that MPS rival traditional deep learning models such as
autoencoders and GANs in terms of performance, while providing much richer
model interpretability. Our approach naturally facilitates the extraction of
feature-wise probabilities, Von Neumann Entropy, and mutual information,
offering a compelling narrative for classification of anomalies and fostering
an unprecedented level of transparency and interpretability, something
fundamental to understand the rationale behind artificial intelligence
decisions.
( 2
min )
Causal inference is a crucial goal of science, enabling researchers to arrive
at meaningful conclusions regarding the predictions of hypothetical
interventions using observational data. Path models, Structural Equation Models
(SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to
unambiguously specify assumptions regarding the causal structure underlying a
phenomenon. Unlike DAGs, which make very few assumptions about the functional
and parametric form, SEM assumes linearity. This can result in functional
misspecification which prevents researchers from undertaking reliable effect
size estimation. In contrast, we propose Super Learner Equation Modeling, a
path modeling technique integrating machine learning Super Learner ensembles.
We empirically demonstrate its ability to provide consistent and unbiased
estimates of causal effects, its competitive performance for linear models when
compared with SEM, and highlight its superiority over SEM when dealing with
non-linear relationships. We provide open-source code, and a tutorial notebook
with example usage, accentuating the easy-to-use nature of the method.
( 2
min )
Broadband infrastructure owners do not always know how their customers are
connected in the local networks, which are structured as rooted trees. A recent
study is able to infer the topology of a local network using discrete time
series data from the leaves of the tree (customers). In this study we propose a
contrastive approach for learning a binary event encoder from continuous time
series data. As a preliminary result, we show that our approach has some
potential in learning a valuable encoder.
( 2
min )
We present a new high-probability PAC-Bayes oracle bound for unbounded
losses. This result can be understood as a PAC-Bayes version of the Chernoff
bound. The proof technique relies on uniformly bounding the tail of certain
random variable based on the Cram\'er transform of the loss. We highlight two
applications of our main result. First, we show that our bound solves the open
problem of optimizing the free parameter on many PAC-Bayes bounds. Finally, we
show that our approach allows working with flexible assumptions on the loss
function, resulting in novel bounds that generalize previous ones and can be
minimized to obtain Gibbs-like posteriors.
( 2
min )
MIT researchers introduce a method that uses artificial intelligence to automate the explanation of complex neural networks.
( 11
min )
A new study finds that language regions in the left hemisphere light up when reading uncommon sentences, while straightforward sentences elicit little response.
( 9
min )
Artificial Intelligence (AI) has been around for many decades but now it has become a buzzword even among non-technical people because of the generative AI models like ChatGPT, Bard, Scribe, Claude, DALL·E 2, and a lot more. AI has moved beyond its sci-fi origins to reality, creating human-like content and powering self-driving cars. However, despite… Read More »Mitigating Ethical Risks in Generative AI: Strategies for a Safe and Secure AI Application
The post Mitigating Ethical Risks in Generative AI: Strategies for a Safe and Secure AI Application appeared first on Data Science Central.
( 21
min )
Inventory Routing Problem (IRP) is a crucial challenge in supply chain
management as it involves optimizing efficient route selection while
considering the uncertainty of inventory demand planning. To solve IRPs,
usually a two-stage approach is employed, where demand is predicted using
machine learning techniques first, and then an optimization algorithm is used
to minimize routing costs. Our experiment shows machine learning models fall
short of achieving perfect accuracy because inventory levels are influenced by
the dynamic business environment, which, in turn, affects the optimization
problem in the next stage, resulting in sub-optimal decisions. In this paper,
we formulate and propose a decision-focused learning-based approach to solving
real-world IRPs. This approach directly integrates inventory prediction and
routing optimization within an end-to-end system potentially ensuring a robust
supply chain strategy.
( 2
min )
We analyze a stochastic approximation algorithm for decision-dependent
problems, wherein the data distribution used by the algorithm evolves along the
iterate sequence. The primary examples of such problems appear in performative
prediction and its multiplayer extensions. We show that under mild assumptions,
the deviation between the average iterate of the algorithm and the solution is
asymptotically normal, with a covariance that clearly decouples the effects of
the gradient noise and the distributional shift. Moreover, building on the work
of H\'ajek and Le Cam, we show that the asymptotic performance of the algorithm
with averaging is locally minimax optimal.
( 2
min )
Try to generate new bridge types using generative artificial intelligence
technology. Symmetric structured image dataset of three-span beam bridge, arch
bridge, cable-stayed bridge and suspension bridge are used . Based on Python
programming language, TensorFlow and Keras deep learning platform framework ,
as well as Wasserstein loss function and Lipschitz constraints, generative
adversarial network is constructed and trained. From the obtained low
dimensional bridge-type latent space sampling, new bridge types with asymmetric
structures can be generated. Generative adversarial network can create new
bridge types by organically combining different structural components on the
basis of human original bridge types. It has a certain degree of human original
ability. Generative artificial intelligence technology can open up imagination
space and inspire humanity.
( 2
min )
Despite the efficient market hypothesis, many studies suggest the existence
of inefficiencies in the stock market leading to the development of techniques
to gain above-market returns. Systematic trading has undergone significant
advances in recent decades with deep learning schemes emerging as a powerful
tool for analyzing and predicting market behavior. In this paper, a method is
proposed that is inspired by how professional technical analysts trade. This
scheme looks at stock prices of the previous 600 days and predicts whether the
stock price will rise or fall 10% or 20% within the next D days. Plus, the
proposed method uses the Resnet's (a deep learning model) skip connections and
logits to increase the probability of the prediction. The model was trained and
tested using historical data from both the Korean and US stock markets. We show
that using the period label of 5 gives the best result. On Korea market it
achieved a profit more than 39% above the market return, and a profit more than
40% above the market return on the US market.
( 2
min )
In the arena of privacy-preserving machine learning, differentially private
stochastic gradient descent (DP-SGD) has outstripped the objective perturbation
mechanism in popularity and interest. Though unrivaled in versatility, DP-SGD
requires a non-trivial privacy overhead (for privately tuning the model's
hyperparameters) and a computational complexity which might be extravagant for
simple models such as linear and logistic regression. This paper revamps the
objective perturbation mechanism with tighter privacy analyses and new
computational tools that boost it to perform competitively with DP-SGD on
unconstrained convex generalized linear problems.
( 2
min )
In the intricate architecture of the mammalian central nervous system,
neurons form populations. Axonal bundles communicate between these clusters
using spike trains. However, these neuron populations' precise encoding and
operations have yet to be discovered. In our analysis, the starting point is a
state-of-the-art mechanistic model of a generic neuron endowed with plasticity.
From this simple framework emerges a subtle mathematical construct: The
representation and manipulation of information can be precisely characterized
by an algebra of convex cones. Furthermore, these neuron populations are not
merely passive transmitters. They act as operators within this algebraic
structure, mirroring the functionality of a low-level programming language.
When these populations interconnect, they embody succinct yet potent algebraic
expressions. These networks allow them to implement many operations, such as
specialization, generalization, novelty detection, dimensionality reduction,
inverse modeling, prediction, and associative memory. In broader terms, this
work illuminates the potential of matrix embeddings in advancing our
understanding in fields like cognitive science and AI. These embeddings enhance
the capacity for concept processing and hierarchical description over their
vector counterparts.
( 3
min )
Deep learning for Hamiltonian regression of quantum systems in material
research necessitates satisfying the covariance laws, among which achieving
SO(3)-equivariance without sacrificing the expressiveness of networks remains
an elusive challenge due to the restriction to non-linear mappings on
guaranteeing theoretical equivariance. To alleviate the
covariance-expressiveness dilemma, we propose a hybrid framework with two
cascaded regression stages. The first stage, with a theoretically-guaranteed
covariant neural network modeling symmetry properties of 3D atom systems,
yields theoretically covariant features and baseline Hamiltonian predictions,
assisting the second stage in learning covariance. Meanwhile, the second stage,
powered by a non-linear 3D graph Transformer network we propose for structural
modeling of 3D atomic systems, refines the first stage's output as a
fine-grained prediction of Hamiltonians with better expressiveness capability.
The combination of a theoretically covariant yet inevitably less expressive
model with a highly expressive non-linear network enables precise,
generalizable predictions while maintaining robust covariance under coordinate
transformations. Our method achieves state-of-the-art performance in
Hamiltonian prediction for electronic structure calculations, confirmed through
experiments on five crystalline material databases.
( 2
min )
In continual learning from demonstration (CLfD), a robot learns a sequence of
real-world motion skills continually from human demonstrations. Recently,
hypernetworks have been successful in solving this problem. In this paper, we
perform an exploratory study of the effects of different optimizers,
initializers, and network architectures on the continual learning performance
of hypernetworks for CLfD. Our results show that adaptive learning rate
optimizers work well, but initializers specially designed for hypernetworks
offer no advantages for CLfD. We also show that hypernetworks that are capable
of stable trajectory predictions are robust to different network architectures.
Our open-source code is available at
https://github.com/sebastianbergner/ExploringCLFD.
( 2
min )
The tasks of designing messenger RNAs and non-coding RNAs are discrete
optimization problems, and several versions of these problems are NP-hard. As
an alternative to commonly used local search methods, we formulate these
problems as continuous optimization and develop a general framework for this
optimization based on a new concept of "expected partition function". The basic
idea is to start with a distribution over all possible candidate sequences, and
extend the objective function from a sequence to a distribution. We then use
gradient descent-based optimization methods to improve the extended objective
function, and the distribution will gradually shrink towards a one-hot sequence
(i.e., a single sequence). We consider two important case studies within this
framework, the mRNA design problem optimizing for partition function (i.e.,
ensemble free energy) and the non-coding RNA design problem optimizing for
conditional (i.e., Boltzmann) probability. In both cases, our approach
demonstrate promising preliminary results. We make our code available at
https://github.com/KuNyaa/RNA_Design_codebase.
( 2
min )
We introduce a graph-aware autoencoder ensemble framework, with associated
formalisms and tooling, designed to facilitate deep learning for scholarship in
the humanities. By composing sub-architectures to produce a model isomorphic to
a humanistic domain we maintain interpretability while providing function
signatures for each sub-architectural choice, allowing both traditional and
computational researchers to collaborate without disrupting established
practices. We illustrate a practical application of our approach to a
historical study of the American post-Atlantic slave trade, and make several
specific technical contributions: a novel hybrid graph-convolutional
autoencoder mechanism, batching policies for common graph topologies, and
masking techniques for particular use-cases. The effectiveness of the framework
for broadening participation of diverse domains is demonstrated by a growing
suite of two dozen studies, both collaborations with humanists and established
tasks from machine learning literature, spanning a variety of fields and data
modalities. We make performance comparisons of several different architectural
choices and conclude with an ambitious list of imminent next steps for this
research.
( 2
min )
Experimental particle physics uses machine learning for many of tasks, where
one application is to classify signal and background events. The classification
can be used to bin an analysis region to enhance the expected significance for
a mass resonance search. In natural language processing, one of the leading
neural network architectures is the transformer. In this work, an event
classifier transformer is proposed to bin an analysis region, in which the
network is trained with special techniques. The techniques developed here can
enhance the significance and reduce the correlation between the network's
output and the reconstructed mass. It is found that this trained network can
perform better than boosted decision trees and feed-forward networks.
( 2
min )
One among several advantages of measure transport methods is that they allow
for a unified framework for processing and analysis of data distributed
according to a wide class of probability measures. Within this context, we
present results from computational studies aimed at assessing the potential of
measure transport techniques, specifically, the use of triangular transport
maps, as part of a workflow intended to support research in the biological
sciences. Scarce data scenarios, which are common in domains such as radiation
biology, are of particular interest. We find that when data is scarce, sparse
transport maps are advantageous. In particular, statistics gathered from
computing series of (sparse) adaptive transport maps, trained on a series of
randomly chosen subsets of the set of available data samples, leads to
uncovering information hidden in the data. As a result, in the radiation
biology application considered here, this approach provides a tool for
generating hypotheses about gene relationships and their dynamics under
radiation exposure.
( 2
min )
We develop a new efficient sequential approximate leverage score algorithm,
SALSA, using methods from randomized numerical linear algebra (RandNLA) for
large matrices. We demonstrate that, with high probability, the accuracy of
SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage
scores. In addition, we show that the theoretical computational complexity and
numerical accuracy of SALSA surpass existing approximations. These theoretical
results are subsequently utilized to develop an efficient algorithm, named
LSARMA, for fitting an appropriate ARMA model to large-scale time series data.
Our proposed algorithm is, with high probability, guaranteed to find the
maximum likelihood estimates of the parameters for the true underlying ARMA
model. Furthermore, it has a worst-case running time that significantly
improves those of the state-of-the-art alternatives in big data regimes.
Empirical results on large-scale data strongly support these theoretical
results and underscore the efficacy of our new approach.
( 2
min )
Try to generate new bridge types using generative artificial intelligence
technology. The grayscale images of the bridge facade with the change of
component width was rendered by 3dsMax animation software, and then the OpenCV
module performed an appropriate amount of geometric transformation (rotation,
horizontal scale, vertical scale) to obtain the image dataset of three-span
beam bridge, arch bridge, cable-stayed bridge and suspension bridge. Based on
Python programming language, TensorFlow and Keras deep learning platform
framework, variational autoencoder was constructed and trained, and
low-dimensional bridge-type latent space that is convenient for vector
operations was obtained. Variational autoencoder can combine two bridge types
on the basis of the original of human into one that is a new bridge type.
Generative artificial intelligence technology can assist bridge designers in
bridge-type innovation, and can be used as copilot.
( 2
min )
While Hopfield networks are known as paradigmatic models for memory storage
and retrieval, modern artificial intelligence systems mainly stand on the
machine learning paradigm. We show that it is possible to formulate a
teacher-student self-supervised learning problem with Boltzmann machines in
terms of a suitable generalization of the Hopfield model with structured
patterns, where the spin variables are the machine weights and patterns
correspond to the training set's examples. We analyze the learning performance
by studying the phase diagram in terms of the training set size, the dataset
noise and the inference temperature (i.e. the weight regularization). With a
small but informative dataset the machine can learn by memorization. With a
noisy dataset, an extensive number of examples above a critical threshold is
needed. In this regime the memory storage limits of the system becomes an
opportunity for the occurrence of a learning regime in which the system can
generalize.
( 2
min )
This work focuses on plant leaf disease classification and explores three
crucial aspects: adversarial training, model explainability, and model
compression. The models' robustness against adversarial attacks is enhanced
through adversarial training, ensuring accurate classification even in the
presence of threats. Leveraging explainability techniques, we gain insights
into the model's decision-making process, improving trust and transparency.
Additionally, we explore model compression techniques to optimize computational
efficiency while maintaining classification performance. Through our
experiments, we determine that on a benchmark dataset, the robustness can be
the price of the classification accuracy with performance reductions of 3%-20%
for regular tests and gains of 50%-70% for adversarial attack tests. We also
demonstrate that a student model can be 15-25 times more computationally
efficient for a slight performance reduction, distilling the knowledge of more
complex models.
( 2
min )
In this paper, we propose a novel and general framework to construct tight
framelet systems on graphs with localized supports based on hierarchical
partitions. Our construction provides parametrized graph framelet systems with
great generality based on partition trees, by which we are able to find the
size of a low-dimensional subspace that best fits the low-rank structure of a
family of signals. The orthogonal decomposition of subspaces provides a key
ingredient for the definition of "generalized vanishing moments" for graph
framelets. In a data-adaptive setting, the graph framelet systems can be
learned by solving an optimization problem on Stiefel manifolds with respect to
our parameterization. Moreover, such graph framelet systems can be further
improved by solving a subsequent optimization problem on Stiefel manifolds,
aiming at providing the utmost sparsity for a given family of graph signals.
Experimental results show that our learned graph framelet systems perform
superiorly in non-linear approximation and denoising tasks.
( 2
min )
We consider the gradient descent flow widely used for the minimization of the
$\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two
modified versions; one adapted for the overparametrized setting, and the other
for the underparametrized setting. Both have a clear and natural invariant
geometric meaning, taking into account the pullback vector bundle structure in
the overparametrized, and the pushforward vector bundle structure in the
underparametrized setting. In the overparametrized case, we prove that,
provided that a rank condition holds, all orbits of the modified gradient
descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform
exponential convergence rate; one thereby obtains an a priori stopping time for
any prescribed proximity to the global minimum. We point out relations of the
latter to sub-Riemannian geometry.
( 2
min )
Magnetic particle imaging (MPI) is an emerging medical imaging modality which
has gained increasing interest in recent years. Among the benefits of MPI are
its high temporal resolution, and that the technique does not expose the
specimen to any kind of ionizing radiation. It is based on the non-linear
response of magnetic nanoparticles to an applied magnetic field. From the
electric signal measured in receive coils, the particle concentration has to be
reconstructed. Due to the ill-posedness of the reconstruction problem, various
regularization methods have been proposed for reconstruction ranging from early
stopping methods, via classical Tikhonov regularization and iterative methods
to modern machine learning approaches. In this work, we contribute to the
latter class: we propose a plug-and-play approach based on a generic zero-shot
denoiser with an $\ell^1$-prior. Moreover, we develop parameter selection
strategies. Finally, we quantitatively and qualitatively evaluate the proposed
algorithmic scheme on the 3D Open MPI data set with different levels of
preprocessing.
( 3
min )
Combinatorial Optimization (CO) problems over graphs appear routinely in many
applications such as in optimizing traffic, viral marketing in social networks,
and matching for job allocation. Due to their combinatorial nature, these
problems are often NP-hard. Existing approximation algorithms and heuristics
rely on the search space to find the solutions and become time-consuming when
this space is large. In this paper, we design a neural method called COMBHelper
to reduce this space and thus improve the efficiency of the traditional CO
algorithms based on node selection. Specifically, it employs a Graph Neural
Network (GNN) to identify promising nodes for the solution set. This pruned
search space is then fed to the traditional CO algorithms. COMBHelper also uses
a Knowledge Distillation (KD) module and a problem-specific boosting module to
bring further efficiency and efficacy. Our extensive experiments show that the
traditional CO algorithms with COMBHelper are at least 2 times faster than
their original versions.
( 2
min )
The reason behind the remarkable properties of High-Entropy Alloys (HEAs) is
rooted in the diverse phases and the crystal structures they contain. In the
realm of material informatics, employing machine learning (ML) techniques to
classify phases and crystal structures of HEAs has gained considerable
significance. In this study, we assembled a new collection of 1345 HEAs with
varying compositions to predict phases. Within this collection, there were 705
sets of data that were utilized to predict the crystal structures with the help
of thermodynamics and electronic configuration. Our study introduces a
methodical framework i.e., the Pearson correlation coefficient that helps in
selecting the strongly co-related features to increase the prediction accuracy.
This study employed five distinct boosting algorithms to predict phases and
crystal structures, offering an enhanced guideline for improving the accuracy
of these predictions. Among all these algorithms, XGBoost gives the highest
accuracy of prediction (94.05%) for phases and LightGBM gives the highest
accuracy of prediction of crystal structure of the phases (90.07%). The
quantification of the influence exerted by parameters on the model's accuracy
was conducted and a new approach was made to elucidate the contribution of
individual parameters in the process of phase prediction and crystal structure
prediction.
( 3
min )
Accumulated Local Effects (ALE) is a model-agnostic approach for global
explanations of the results of black-box machine learning (ML) algorithms.
There are at least three challenges with conducting statistical inference based
on ALE: ensuring the reliability of ALE analyses, especially in the context of
small datasets; intuitively characterizing a variable's overall effect in ML;
and making robust inferences from ML data analysis. In response, we introduce
innovative tools and techniques for statistical inference using ALE,
establishing bootstrapped confidence intervals tailored to dataset size and
introducing ALE effect size measures that intuitively indicate effects on both
the outcome variable scale and a normalized scale. Furthermore, we demonstrate
how to use these tools to draw reliable statistical inferences, reflecting the
flexible patterns ALE adeptly highlights, with implementations available in the
'ale' package in R. This work propels the discourse on ALE and its
applicability in ML and statistical analysis forward, offering practical
solutions to prevailing challenges in the field.
( 3
min )
We study a principal component analysis problem under the spiked Wishart
model in which the structure in the signal is captured by a class of
union-of-subspace models. This general class includes vanilla sparse PCA as
well as its variants with graph sparsity. With the goal of studying these
problems under a unified statistical and computational lens, we establish
fundamental limits that depend on the geometry of the problem instance, and
show that a natural projected power method exhibits local convergence to the
statistically near-optimal neighborhood of the solution. We complement these
results with end-to-end analyses of two important special cases given by path
and tree sparsity in a general basis, showing initialization methods and
matching evidence of computational hardness. Overall, our results indicate that
several of the phenomena observed for vanilla sparse PCA extend in a natural
fashion to its structured counterparts.
( 2
min )
Self-supervised and language-supervised image models contain rich knowledge
of the world that is important for generalization. Many robotic tasks, however,
require a detailed understanding of 3D geometry, which is often lacking in 2D
image features. This work bridges this 2D-to-3D gap for robotic manipulation by
leveraging distilled feature fields to combine accurate 3D geometry with rich
semantics from 2D foundation models. We present a few-shot learning method for
6-DOF grasping and placing that harnesses these strong spatial and semantic
priors to achieve in-the-wild generalization to unseen objects. Using features
distilled from a vision-language model, CLIP, we present a way to designate
novel objects for manipulation via free-text natural language, and demonstrate
its ability to generalize to unseen expressions and novel categories of
objects.
( 2
min )
Federated Learning (FL) is a machine-learning approach enabling collaborative
model training across multiple decentralized edge devices that hold local data
samples, all without exchanging these samples. This collaborative process
occurs under the supervision of a central server orchestrating the training or
via a peer-to-peer network. The significance of FL is particularly pronounced
in industries such as healthcare and finance, where data privacy holds
paramount importance. However, training a model under the Federated learning
setting brings forth several challenges, with one of the most prominent being
the heterogeneity of data distribution among the edge devices. The data is
typically non-independently and non-identically distributed (non-IID), thereby
presenting challenges to model convergence. This report delves into the issues
arising from non-IID and heterogeneous data and explores current algorithms
designed to address these challenges.
( 2
min )
Deep Learning models have shown success in a large variety of tasks by
extracting correlation patterns from high-dimensional data but still struggle
when generalizing out of their initial distribution. As causal engines aim to
learn mechanisms independent from a data distribution, combining Deep Learning
with Causality can have a great impact on the two fields. In this paper, we
further motivate this assumption. We perform an extensive overview of the
theories and methods for Causality from different perspectives, with an
emphasis on Deep Learning and the challenges met by the two domains. We show
early attempts to bring the fields together and the possible perspectives for
the future. We finish by providing a large variety of applications for
techniques from Causality.
( 2
min )
There is a recent interest on first-order methods for linear programming
(LP). In this paper,we propose a stochastic algorithm using variance reduction
and restarts for solving sharp primal-dual problems such as LP. We show that
the proposed stochastic method exhibits a linear convergence rate for solving
sharp instances with a high probability. In addition, we propose an efficient
coordinate-based stochastic oracle for unconstrained bilinear problems, which
has $\mathcal O(1)$ per iteration cost and improves the complexity of the
existing deterministic and stochastic algorithms. Finally, we show that the
obtained linear convergence rate is nearly optimal (upto $\log$ terms) for a
wide class of stochastic primal dual methods.
( 2
min )
Group equivariant non-expansive operators have been recently proposed as
basic components in topological data analysis and deep learning. In this paper
we study some geometric properties of the spaces of group equivariant operators
and show how a space $\mathcal{F}$ of group equivariant non-expansive operators
can be endowed with the structure of a Riemannian manifold, so making available
the use of gradient descent methods for the minimization of cost functions on
$\mathcal{F}$. As an application of this approach, we also describe a procedure
to select a finite set of representative group equivariant non-expansive
operators in the considered manifold.
( 2
min )
We present an elementary yet general proof of duality for Wasserstein
distributionally robust optimization. The duality holds for any arbitrary
Kantorovich transport cost, measurable loss function, and nominal probability
distribution, provided that an interchangeability principle holds, which is
equivalent to certain measurability conditions. To illustrate the broader
applicability of our approach, we provide a rigorous treatment of duality
results in distributionally robust Markov decision processes and
distributionally robust multistage stochastic programming. Furthermore, we
extend the result to other problems including infinity-Wasserstein
distributionally robust optimization, risk-averse optimization, and globalized
distributionally robust counterpart.
( 2
min )
Multiscale stochastic dynamical systems have been widely adopted to a variety
of scientific and engineering problems due to their capability of depicting
complex phenomena in many real world applications. This work is devoted to
investigating the effective dynamics for slow-fast stochastic dynamical
systems. Given observation data on a short-term period satisfying some unknown
slow-fast stochastic systems, we propose a novel algorithm including a neural
network called Auto-SDE to learn invariant slow manifold. Our approach captures
the evolutionary nature of a series of time-dependent autoencoder neural
networks with the loss constructed from a discretized stochastic differential
equation. Our algorithm is also validated to be accurate, stable and effective
through numerical experiments under various evaluation metrics.
( 2
min )
In this paper we will discuss metalearning and how we can go beyond the
current classical learning paradigm. We will first address the importance of
inductive biases in the learning process and what is at stake: the quantities
of data necessary to learn. We will subsequently see the importance of choosing
suitable parameterizations to end up with well-defined learning processes.
Especially since in the context of real-world applications, we face numerous
biases due, e.g., to the specificities of sensors, the heterogeneity of data
sources, the multiplicity of points of view, etc. This will lead us to the idea
of exploiting the structuring of the concepts to be learned in order to
organize the learning process that we published previously. We conclude by
discussing the perspectives around parameter-tying schemes and the emergence of
universal aspects in the models thus learned.
( 2
min )
Large language model (LLM) scaling laws are empirical formulas that estimate
changes in model quality as a result of increasing parameter count and training
data. However, these formulas, including the popular DeepMind Chinchilla
scaling laws, neglect to include the cost of inference. We modify the
Chinchilla scaling laws to calculate the optimal LLM parameter count and
pre-training data size to train and deploy a model of a given quality and
inference demand. We conduct our analysis both in terms of a compute budget and
real-world costs and find that LLM researchers expecting reasonably large
inference demand (~1B requests) should train models smaller and longer than
Chinchilla-optimal.
( 2
min )
In this work, we consider the offline preference-based reinforcement learning
problem. We focus on the two-phase learning approach that is prevalent in
previous reinforcement learning from human preference works. We find a
challenge in applying two-phase learning in the offline PBRL setting that the
learned utility model can be too hard for the learning agent to optimize during
the second learning phase. To overcome the challenge, we propose a two-phasing
learning approach under behavior regularization through action clipping. The
insight is that the state-actions which are poorly covered by the dataset can
only provide limited information and increase the complexity of the problem in
the second learning phase. Our method ignores such state-actions during the
second learning phase to achieve higher learning efficiency. We empirically
verify that our method has high learning efficiency on a variety of datasets in
robotic control environments.
( 2
min )
Decision-making is a dynamic process requiring perception, memory, and
reasoning to make choices and find optimal policies. Traditional approaches to
decision-making suffer from sample efficiency and generalization, while
large-scale self-supervised pretraining has enabled fast adaptation with
fine-tuning or few-shot learning in language and vision. We thus argue to
integrate knowledge acquired from generic large-scale self-supervised
pretraining into downstream decision-making problems. We propose
Pretrain-Then-Adapt pipeline and survey recent work on data collection,
pretraining objectives and adaptation strategies for decision-making
pretraining and downstream inference. Finally, we identify critical challenges
and future directions for developing decision foundation model with the help of
generic and flexible self-supervised pretraining.
( 2
min )
In the realm of cryptocurrency, the prediction of Bitcoin prices has garnered
substantial attention due to its potential impact on financial markets and
investment strategies. This paper propose a comparative study on hybrid machine
learning algorithms and leverage on enhancing model interpretability.
Specifically, linear regression(OLS, LASSO), long-short term memory(LSTM),
decision tree regressors are introduced. Through the grounded experiments, we
observe linear regressor achieves the best performance among candidate models.
For the interpretability, we carry out a systematic overview on the
preprocessing techniques of time-series statistics, including decomposition,
auto-correlational function, exponential triple forecasting, which aim to
excavate latent relations and complex patterns appeared in the financial
time-series forecasting. We believe this work may derive more attention and
inspire more researches in the realm of time-series analysis and its realistic
applications.
( 2
min )
This paper introduces an iterative algorithm designed to train additive
models with favorable memory storage and computational requirements. The
algorithm can be viewed as the functional counterpart of stochastic gradient
descent, applied to the coefficients of a truncated basis expansion of the
component functions. We show that the resulting estimator satisfies an oracle
inequality that allows for model mispecification. In the well-specified
setting, by choosing the learning rate carefully across three distinct stages
of training, we prove that its risk is minimax optimal in terms of the
dependence on the dimensionality of the data and the size of the training
sample.
( 2
min )
Growth in the penetration of renewable energy sources makes supply more
uncertain and leads to an increase in the system imbalance. This trend,
together with the single imbalance pricing, opens an opportunity for balance
responsible parties (BRPs) to perform energy arbitrage in the imbalance
settlement mechanism. To this end, we propose a battery control framework based
on distributional reinforcement learning (DRL). Our proposed control framework
takes a risk-sensitive perspective, allowing BRPs to adjust their risk
preferences: we aim to optimize a weighted sum of the arbitrage profit and a
risk measure while constraining the daily number of cycles for the battery. We
assess the performance of our proposed control framework using the Belgian
imbalance prices of 2022 and compare two state-of-the-art RL methods, deep Q
learning and soft actor-critic. Results reveal that the distributional soft
actor-critic method can outperform other methods. Moreover, we note that our
fully risk-averse agent appropriately learns to hedge against the risk related
to the unknown imbalance price by (dis)charging the battery only when the agent
is more certain about the price.
( 2
min )
We analyze a stochastic approximation algorithm for decision-dependent
problems, wherein the data distribution used by the algorithm evolves along the
iterate sequence. The primary examples of such problems appear in performative
prediction and its multiplayer extensions. We show that under mild assumptions,
the deviation between the average iterate of the algorithm and the solution is
asymptotically normal, with a covariance that clearly decouples the effects of
the gradient noise and the distributional shift. Moreover, building on the work
of H\'ajek and Le Cam, we show that the asymptotic performance of the algorithm
with averaging is locally minimax optimal.
( 2
min )
This paper introduces an iterative algorithm designed to train additive
models with favorable memory storage and computational requirements. The
algorithm can be viewed as the functional counterpart of stochastic gradient
descent, applied to the coefficients of a truncated basis expansion of the
component functions. We show that the resulting estimator satisfies an oracle
inequality that allows for model mispecification. In the well-specified
setting, by choosing the learning rate carefully across three distinct stages
of training, we prove that its risk is minimax optimal in terms of the
dependence on the dimensionality of the data and the size of the training
sample.
( 2
min )
We develop a new efficient sequential approximate leverage score algorithm,
SALSA, using methods from randomized numerical linear algebra (RandNLA) for
large matrices. We demonstrate that, with high probability, the accuracy of
SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage
scores. In addition, we show that the theoretical computational complexity and
numerical accuracy of SALSA surpass existing approximations. These theoretical
results are subsequently utilized to develop an efficient algorithm, named
LSARMA, for fitting an appropriate ARMA model to large-scale time series data.
Our proposed algorithm is, with high probability, guaranteed to find the
maximum likelihood estimates of the parameters for the true underlying ARMA
model. Furthermore, it has a worst-case running time that significantly
improves those of the state-of-the-art alternatives in big data regimes.
Empirical results on large-scale data strongly support these theoretical
results and underscore the efficacy of our new approach.
( 2
min )
Inventory Routing Problem (IRP) is a crucial challenge in supply chain
management as it involves optimizing efficient route selection while
considering the uncertainty of inventory demand planning. To solve IRPs,
usually a two-stage approach is employed, where demand is predicted using
machine learning techniques first, and then an optimization algorithm is used
to minimize routing costs. Our experiment shows machine learning models fall
short of achieving perfect accuracy because inventory levels are influenced by
the dynamic business environment, which, in turn, affects the optimization
problem in the next stage, resulting in sub-optimal decisions. In this paper,
we formulate and propose a decision-focused learning-based approach to solving
real-world IRPs. This approach directly integrates inventory prediction and
routing optimization within an end-to-end system potentially ensuring a robust
supply chain strategy.
( 2
min )
We consider the gradient descent flow widely used for the minimization of the
$\mathcal{L}^2$ cost function in Deep Learning networks, and introduce two
modified versions; one adapted for the overparametrized setting, and the other
for the underparametrized setting. Both have a clear and natural invariant
geometric meaning, taking into account the pullback vector bundle structure in
the overparametrized, and the pushforward vector bundle structure in the
underparametrized setting. In the overparametrized case, we prove that,
provided that a rank condition holds, all orbits of the modified gradient
descent drive the $\mathcal{L}^2$ cost to its global minimum at a uniform
exponential convergence rate; one thereby obtains an a priori stopping time for
any prescribed proximity to the global minimum. We point out relations of the
latter to sub-Riemannian geometry.
( 2
min )
We study a principal component analysis problem under the spiked Wishart
model in which the structure in the signal is captured by a class of
union-of-subspace models. This general class includes vanilla sparse PCA as
well as its variants with graph sparsity. With the goal of studying these
problems under a unified statistical and computational lens, we establish
fundamental limits that depend on the geometry of the problem instance, and
show that a natural projected power method exhibits local convergence to the
statistically near-optimal neighborhood of the solution. We complement these
results with end-to-end analyses of two important special cases given by path
and tree sparsity in a general basis, showing initialization methods and
matching evidence of computational hardness. Overall, our results indicate that
several of the phenomena observed for vanilla sparse PCA extend in a natural
fashion to its structured counterparts.
( 2
min )
Multiscale stochastic dynamical systems have been widely adopted to a variety
of scientific and engineering problems due to their capability of depicting
complex phenomena in many real world applications. This work is devoted to
investigating the effective dynamics for slow-fast stochastic dynamical
systems. Given observation data on a short-term period satisfying some unknown
slow-fast stochastic systems, we propose a novel algorithm including a neural
network called Auto-SDE to learn invariant slow manifold. Our approach captures
the evolutionary nature of a series of time-dependent autoencoder neural
networks with the loss constructed from a discretized stochastic differential
equation. Our algorithm is also validated to be accurate, stable and effective
through numerical experiments under various evaluation metrics.
( 2
min )
We present an elementary yet general proof of duality for Wasserstein
distributionally robust optimization. The duality holds for any arbitrary
Kantorovich transport cost, measurable loss function, and nominal probability
distribution, provided that an interchangeability principle holds, which is
equivalent to certain measurability conditions. To illustrate the broader
applicability of our approach, we provide a rigorous treatment of duality
results in distributionally robust Markov decision processes and
distributionally robust multistage stochastic programming. Furthermore, we
extend the result to other problems including infinity-Wasserstein
distributionally robust optimization, risk-averse optimization, and globalized
distributionally robust counterpart.
( 2
min )
2024 will be all about changing business models due to the massive disruption of generative AI. There will be new winners and many losers. The incumbents especially have a lot to lose – but permissionless innovation has always been the hallmark of American innovation. We see the usual vanguard action from the incumbents who find… Read More »Generative AI business model disruption: The NYT lawsuit posturing
The post Generative AI business model disruption: The NYT lawsuit posturing appeared first on Data Science Central.
( 20
min )
An MIT panel charts how artificial intelligence will impact art and design.
( 10
min )
An avid cyclist, Thomas Park knows the value of having lots of gears to maintain a smooth, fast ride. So, when the software architect designed an AI inference platform to serve predictions for Oracle Cloud Infrastructure’s (OCI) Vision AI service, he picked NVIDIA Triton Inference Server. That’s because it can shift up, down or sideways Read article >
( 6
min )
A new year means new creative opportunities and new In the NVIDIA Studio beats.
( 7
min )
We introduce SOLAR 10.7B, a large language model (LLM) with 10.7 billion
parameters, demonstrating superior performance in various natural language
processing (NLP) tasks. Inspired by recent efforts to efficiently up-scale
LLMs, we present a method for scaling LLMs called depth up-scaling (DUS), which
encompasses depthwise scaling and continued pretraining. In contrast to other
LLM up-scaling methods that use mixture-of-experts, DUS does not require
complex changes to train and inference efficiently. We show experimentally that
DUS is simple yet effective in scaling up high-performance LLMs from small
ones. Building on the DUS model, we additionally present SOLAR 10.7B-Instruct,
a variant fine-tuned for instruction-following capabilities, surpassing
Mixtral-8x7B-Instruct. SOLAR 10.7B is publicly available under the Apache 2.0
license, promoting broad access and application in the LLM field.
( 2
min )
The Earth mover's distance (EMD) is a useful metric for image recognition and
classification, but its usual implementations are not differentiable or too
slow to be used as a loss function for training other algorithms via gradient
descent. In this paper, we train a convolutional neural network (CNN) to learn
a differentiable, fast approximation of the EMD and demonstrate that it can be
used as a substitute for computing-intensive EMD implementations. We apply this
differentiable approximation in the training of an autoencoder-inspired neural
network (encoder NN) for data compression at the high-luminosity LHC at CERN.
The goal of this encoder NN is to compress the data while preserving the
information related to the distribution of energy deposits in particle
detectors. We demonstrate that the performance of our encoder NN trained using
the differentiable EMD CNN surpasses that of training with loss functions based
on mean squared error.
( 3
min )
Sound event detection (SED), as a core module of acoustic environmental
analysis, suffers from the problem of data deficiency. The integration of
semi-supervised learning (SSL) largely mitigates such problem while bringing no
extra annotation budget. This paper researches on several core modules of SSL,
and introduces a random consistency training (RCT) strategy. First, a
self-consistency loss is proposed to fuse with the teacher-student model to
stabilize the training. Second, a hard mixup data augmentation is proposed to
account for the additive property of sounds. Third, a random augmentation
scheme is applied to flexibly combine different types of data augmentations.
Experiments show that the proposed strategy outperform other widely-used
strategies.
( 2
min )
Semi-supervised learning (SSL) approaches have been successfully applied in a
wide range of engineering and scientific fields. This paper investigates the
generative model framework with a missingness mechanism for unclassified
observations, as introduced by Ahfock and McLachlan(2020). We show that in a
partially classified sample, a classifier using Bayes rule of allocation with a
missing-data mechanism can surpass a fully supervised classifier in a two-class
normal homoscedastic model, especially with moderate to low overlap and
proportion of missing class labels, or with large overlap but few missing
labels. It also outperforms a classifier with no missing-data mechanism
regardless of the overlap region or the proportion of missing class labels. Our
exploration of two- and three-component normal mixture models with unequal
covariances through simulations further corroborates our findings. Finally, we
illustrate the use of the proposed classifier with a missing-data mechanism on
interneuronal and skin lesion datasets.
( 2
min )
This paper introduces AIJack, an open-source library designed to assess
security and privacy risks associated with the training and deployment of
machine learning models. Amid the growing interest in big data and AI,
advancements in machine learning research and business are accelerating.
However, recent studies reveal potential threats, such as the theft of training
data and the manipulation of models by malicious attackers. Therefore, a
comprehensive understanding of machine learning's security and privacy
vulnerabilities is crucial for the safe integration of machine learning into
real-world products. AIJack aims to address this need by providing a library
with various attack and defense methods through a unified API. The library is
publicly available on GitHub (https://github.com/Koukyosyumei/AIJack).
( 2
min )
The Davis-Kahan-Wedin $\sin \Theta$ theorem describes how the singular
subspaces of a matrix change when subjected to a small perturbation. This
classic result is sharp in the worst case scenario. In this paper, we prove a
stochastic version of the Davis-Kahan-Wedin $\sin \Theta$ theorem when the
perturbation is a Gaussian random matrix. Under certain structural assumptions,
we obtain an optimal bound that significantly improves upon the classic
Davis-Kahan-Wedin $\sin \Theta$ theorem. One of our key tools is a new
perturbation bound for the singular values, which may be of independent
interest.
( 2
min )
This paper presents a systematic literature review (SLR) on the
explainability and interpretability of machine learning (ML) models within the
context of predictive process mining, using the PRISMA framework. Given the
rapid advancement of artificial intelligence (AI) and ML systems, understanding
the "black-box" nature of these technologies has become increasingly critical.
Focusing specifically on the domain of process mining, this paper delves into
the challenges of interpreting ML models trained with complex business process
data. We differentiate between intrinsically interpretable models and those
that require post-hoc explanation techniques, providing a comprehensive
overview of the current methodologies and their applications across various
application domains. Through a rigorous bibliographic analysis, this research
offers a detailed synthesis of the state of explainability and interpretability
in predictive process mining, identifying key trends, challenges, and future
directions. Our findings aim to equip researchers and practitioners with a
deeper understanding of how to develop and implement more trustworthy,
transparent, and effective intelligent systems for predictive process
analytics.
( 2
min )
Time series forecasting plays a crucial role in diverse fields, necessitating
the development of robust models that can effectively handle complex temporal
patterns. In this article, we present a novel feature selection method embedded
in Long Short-Term Memory networks, leveraging a multi-objective evolutionary
algorithm. Our approach optimizes the weights and biases of the LSTM in a
partitioned manner, with each objective function of the evolutionary algorithm
targeting the root mean square error in a specific data partition. The set of
non-dominated forecast models identified by the algorithm is then utilized to
construct a meta-model through stacking-based ensemble learning. Furthermore,
our proposed method provides an avenue for attribute importance determination,
as the frequency of selection for each attribute in the set of non-dominated
forecasting models reflects their significance. This attribute importance
insight adds an interpretable dimension to the forecasting process.
Experimental evaluations on air quality time series data from Italy and
southeast Spain demonstrate that our method substantially improves the
generalization ability of conventional LSTMs, effectively reducing overfitting.
Comparative analyses against state-of-the-art CancelOut and EAR-FS methods
highlight the superior performance of our approach.
( 2
min )
Approximate Computing (AxC) techniques have become increasingly popular in
trading off accuracy for performance gains in various applications. Selecting
the best AxC techniques for a given application is challenging. Among proposed
approaches for exploring the design space, Machine Learning approaches such as
Reinforcement Learning (RL) show promising results. In this paper, we proposed
an RL-based multi-objective Design Space Exploration strategy to find the
approximate versions of the application that balance accuracy degradation and
power and computation time reduction. Our experimental results show a good
trade-off between accuracy degradation and decreased power and computation time
for some benchmarks.
( 2
min )
Online display advertising platforms service numerous advertisers by
providing real-time bidding (RTB) for the scale of billions of ad requests
every day. The bidding strategy handles ad requests cross multiple channels to
maximize the number of clicks under the set financial constraints, i.e., total
budget and cost-per-click (CPC), etc. Different from existing works mainly
focusing on single channel bidding, we explicitly consider cross-channel
constrained bidding with budget allocation. Specifically, we propose a
hierarchical offline deep reinforcement learning (DRL) framework called
``HiBid'', consisted of a high-level planner equipped with auxiliary loss for
non-competitive budget allocation, and a data augmentation enhanced low-level
executor for adaptive bidding strategy in response to allocated budgets.
Additionally, a CPC-guided action selection mechanism is introduced to satisfy
the cross-channel CPC constraint. Through extensive experiments on both the
large-scale log data and online A/B testing, we confirm that HiBid outperforms
six baselines in terms of the number of clicks, CPC satisfactory ratio, and
return-on-investment (ROI). We also deploy HiBid on Meituan advertising
platform to already service tens of thousands of advertisers every day.
( 2
min )
In many real-world problems, there is a limited set of training data, but an
abundance of unlabeled data. We propose a new method, Generative Posterior
Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in
high-dimensional problems. A GPN is a generative model that, given a prior
distribution over functions, approximates the posterior distribution directly
by regularizing the network towards samples from the prior. We prove
theoretically that our method indeed approximates the Bayesian posterior and
show empirically that it improves epistemic uncertainty estimation and
scalability over competing methods.
( 2
min )
Adaptive optimization methods are widely recognized as among the most popular
approaches for training Deep Neural Networks (DNNs). Techniques such as Adam,
AdaGrad, and AdaHessian utilize a preconditioner that modifies the search
direction by incorporating information about the curvature of the objective
function. However, despite their adaptive characteristics, these methods still
require manual fine-tuning of the step-size. This, in turn, impacts the time
required to solve a particular problem. This paper presents an optimization
framework named SANIA to tackle these challenges. Beyond eliminating the need
for manual step-size hyperparameter settings, SANIA incorporates techniques to
address poorly scaled or ill-conditioned problems. We also explore several
preconditioning methods, including Hutchinson's method, which approximates the
Hessian diagonal of the loss function. We conclude with an extensive empirical
examination of the proposed techniques across classification tasks, covering
both convex and non-convex contexts.
( 2
min )
This paper introduces Auto-modeling of Formal Verification with Real-world
Prompting for 5G and NextG protocols (AVRE), a novel system designed for the
formal verification of Next Generation (NextG) communication protocols,
addressing the increasing complexity and scalability challenges in network
protocol design and verification. Utilizing Large Language Models (LLMs), AVRE
transforms protocol descriptions into dependency graphs and formal models,
efficiently resolving ambiguities and capturing design intent. The system
integrates a transformer model with LLMs to autonomously establish quantifiable
dependency relationships through cross- and self-attention mechanisms. Enhanced
by iterative feedback from the HyFuzz experimental platform, AVRE significantly
advances the accuracy and relevance of formal verification in complex
communication protocols, offering a groundbreaking approach to validating
sophisticated communication systems. We compare CAL's performance with
state-of-the-art LLM-based models and traditional time sequence models,
demonstrating its superiority in accuracy and robustness, achieving an accuracy
of 95.94\% and an AUC of 0.98. This NLP-based approach enables, for the first
time, the creation of exploits directly from design documents, making
remarkable progress in scalable system verification and validation.
( 2
min )
Although database systems perform well in data access and manipulation, their
relational model hinders data scientists from formulating machine learning
algorithms in SQL. Nevertheless, we argue that modern database systems perform
well for machine learning algorithms expressed in relational algebra. To
overcome the barrier of the relational model, this paper shows how to transform
data into a relational representation for training neural networks in SQL: We
first describe building blocks for data transformation, model training and
inference in SQL-92 and their counterparts using an extended array data type.
Then, we compare the implementation for model training and inference using
array data types to the one using a relational representation in SQL-92 only.
The evaluation in terms of runtime and memory consumption proves the
suitability of modern database systems for matrix algebra, although specialised
array data types perform better than matrices in relational representation.
( 2
min )
Understanding intermediate representations of the concepts learned by deep
learning classifiers is indispensable for interpreting general model behaviors.
Existing approaches to reveal learned concepts often rely on human supervision,
such as pre-defined concept sets or segmentation processes. In this paper, we
propose a novel unsupervised method for discovering distributed representations
of concepts by selecting a principal subset of neurons. Our empirical findings
demonstrate that instances with similar neuron activation states tend to share
coherent concepts. Based on the observations, the proposed method selects
principal neurons that construct an interpretable region, namely a Relaxed
Decision Region (RDR), encompassing instances with coherent concepts in the
feature space. It can be utilized to identify unlabeled subclasses within data
and to detect the causes of misclassifications. Furthermore, the applicability
of our method across various layers discloses distinct distributed
representations over the layers, which provides deeper insights into the
internal mechanisms of the deep learning model.
( 2
min )
In healthcare, patient data is often collected as multivariate time series,
providing a comprehensive view of a patient's health status over time. While
this data can be sparse, connected devices may enhance its frequency. The goal
is to create patient profiles from these time series. In the absence of labels,
a predictive model can be used to predict future values while forming a latent
cluster space, evaluated based on predictive performance. We compare two models
on Withing's datasets, M AGMAC LUST which clusters entire time series and
DGM${}^2$ which allows the group affiliation of an individual to change over
time (dynamic clustering).
( 2
min )
In this research, we introduce RefineNet, a novel architecture designed to
address resolution limitations in text-to-image conversion systems. We explore
the challenges of generating high-resolution images from textual descriptions,
focusing on the trade-offs between detail accuracy and computational
efficiency. RefineNet leverages a hierarchical Transformer combined with
progressive and conditional refinement techniques, outperforming existing
models in producing detailed and high-quality images. Through extensive
experiments on diverse datasets, we demonstrate RefineNet's superiority in
clarity and resolution, particularly in complex image categories like animals,
plants, and human faces. Our work not only advances the field of image-to-text
conversion but also opens new avenues for high-fidelity image generation in
various applications.
( 2
min )
Engineering system design, viewed as a decision-making process, faces
challenges due to complexity and uncertainty. In this paper, we present a
framework proposing the use of the Deep Q-learning algorithm to optimize the
design of engineering systems. We outline a step-by-step framework for
optimizing engineering system designs. The goal is to find policies that
maximize the output of a simulation model given multiple sources of
uncertainties. The proposed algorithm handles linear and non-linear multi-stage
stochastic problems, where decision variables are discrete, and the objective
function and constraints are assessed via a Monte Carlo simulation. We
demonstrate the effectiveness of our proposed framework by solving two
engineering system design problems in the presence of multiple uncertainties,
such as price and demand.
( 2
min )
This work proposes $\mu$GUIDE: a general Bayesian framework to estimate
posterior distributions of tissue microstructure parameters from any given
biophysical model or MRI signal representation, with exemplar demonstration in
diffusion-weighted MRI. Harnessing a new deep learning architecture for
automatic signal feature selection combined with simulation-based inference and
efficient sampling of the posterior distributions, $\mu$GUIDE bypasses the high
computational and time cost of conventional Bayesian approaches and does not
rely on acquisition constraints to define model-specific summary statistics.
The obtained posterior distributions allow to highlight degeneracies present in
the model definition and quantify the uncertainty and ambiguity of the
estimated parameters.
( 2
min )
To plan and optimize energy storage demands that account for Li-ion battery
aging dynamics, techniques need to be developed to diagnose battery internal
states accurately and rapidly. This study seeks to reduce the computational
resources needed to determine a battery's internal states by replacing
physics-based Li-ion battery models -- such as the single-particle model (SPM)
and the pseudo-2D (P2D) model -- with a physics-informed neural network (PINN)
surrogate. The surrogate model makes high-throughput techniques, such as
Bayesian calibration, tractable to determine battery internal parameters from
voltage responses. This manuscript is the first of a two-part series that
introduces PINN surrogates of Li-ion battery models for parameter inference
(i.e., state-of-health diagnostics). In this first part, a method is presented
for constructing a PINN surrogate of the SPM. A multi-fidelity hierarchical
training, where several neural nets are trained with multiple physics-loss
fidelities is shown to significantly improve the surrogate accuracy when only
training on the governing equation residuals. The implementation is made
available in a companion repository (https://github.com/NREL/pinnstripes). The
techniques used to develop a PINN surrogate of the SPM are extended in Part II
for the PINN surrogate for the P2D battery model, and explore the Bayesian
calibration capabilities of both surrogates.
( 3
min )
Domain generalization focuses on leveraging knowledge from multiple related
domains with ample training data and labels to enhance inference on unseen
in-distribution (IN) and out-of-distribution (OOD) domains. In our study, we
introduce a two-phase representation learning technique using multi-task
learning. This approach aims to cultivate a latent space from features spanning
multiple domains, encompassing both native and cross-domains, to amplify
generalization to IN and OOD territories. Additionally, we attempt to
disentangle the latent space by minimizing the mutual information between the
prior and latent space, effectively de-correlating spurious feature
correlations. Collectively, the joint optimization will facilitate
domain-invariant feature learning. We assess the model's efficacy across
multiple cybersecurity datasets, using standard classification metrics on both
unseen IN and OOD sets, and juxtapose the results with contemporary domain
generalization methods.
( 2
min )
Despite the success of graph neural networks (GNNs) in various domains, they
exhibit susceptibility to adversarial attacks. Understanding these
vulnerabilities is crucial for developing robust and secure applications. In
this paper, we investigate the impact of test time adversarial attacks through
edge perturbations which involve both edge insertions and deletions. A novel
explainability-based method is proposed to identify important nodes in the
graph and perform edge perturbation between these nodes. The proposed method is
tested for node classification with three different architectures and datasets.
The results suggest that introducing edges between nodes of different classes
has higher impact as compared to removing edges among nodes within the same
class.
( 2
min )
Bayesian parameter inference is useful to improve Li-ion battery diagnostics
and can help formulate battery aging models. However, it is computationally
intensive and cannot be easily repeated for multiple cycles, multiple operating
conditions, or multiple replicate cells. To reduce the computational cost of
Bayesian calibration, numerical solvers for physics-based models can be
replaced with faster surrogates. A physics-informed neural network (PINN) is
developed as a surrogate for the pseudo-2D (P2D) battery model calibration. For
the P2D surrogate, additional training regularization was needed as compared to
the PINN single-particle model (SPM) developed in Part I. Both the PINN SPM and
P2D surrogate models are exercised for parameter inference and compared to data
obtained from a direct numerical solution of the governing equations. A
parameter inference study highlights the ability to use these PINNs to
calibrate scaling parameters for the cathode Li diffusion and the anode
exchange current density. By realizing computational speed-ups of 2250x for the
P2D model, as compared to using standard integrating methods, the PINN
surrogates enable rapid state-of-health diagnostics. In the low-data
availability scenario, the testing error was estimated to 2mV for the SPM
surrogate and 10mV for the P2D surrogate which could be mitigated with
additional data.
( 3
min )
Generalization remains a major problem in supervised learning of
single-channel speech enhancement. In this work, we propose learnable loss
mixup (LLM), a simple and effortless training diagram, to improve the
generalization of deep learning-based speech enhancement models. Loss mixup, of
which learnable loss mixup is a special variant, optimizes a mixture of the
loss functions of random sample pairs to train a model on virtual training data
constructed from these pairs of samples. In learnable loss mixup, by
conditioning on the mixed data, the loss functions are mixed using a non-linear
mixing function automatically learned via neural parameterization. Our
experimental results on the VCTK benchmark show that learnable loss mixup
achieves 3.26 PESQ, outperforming the state-of-the-art.
( 2
min )
Machine learning and data mining techniques are utiized for enhancement of
the security of any network. Researchers used machine learning for pattern
detection, anomaly detection, dynamic policy setting, etc. The methods allow
the program to learn from data and make decisions without human intervention,
consuming a huge training period and computation power. This paper discusses a
novel technique to predict an upcoming attack in a network based on several
data parameters. The dataset is continuous in real-time implementation. The
proposed model comprises dataset pre-processing, and training, followed by the
testing phase. Based on the results of the testing phase, the best model is
selected using which, event class which may lead to an attack is extracted. The
event statistics are used for attack
( 2
min )
The 0/1 matrix factorization defines matrix products using logical AND and OR
as product-sum operators, revealing the factors influencing various decision
processes. Instances and their characteristics are arranged in rows and
columns. Formulating matrix factorization as an energy minimization problem and
exploring it with Simulated Annealing (SA) theoretically enables finding a
minimum solution in sufficient time. However, searching for the optimal
solution in practical time becomes problematic when the energy landscape has
many plateaus with flat slopes. In this work, we propose a method to facilitate
the solution process by applying a gradient to the energy landscape, using a
rectified linear type cost function readily available in modern annealing
machines. We also propose a method to quickly obtain a solution by updating the
cost function's gradient during the search process. Numerical experiments were
conducted, confirming the method's effectiveness with both noise-free
artificial and real data.
( 2
min )
This paper describes a machine learning method to automate reading of cockpit
gauges, using a CNN to invert affine transformations and deduce aircraft states
from instrument images. Validated with synthetic images of a turn-and-bank
indicator, this research introduces methods such as generating datasets from a
single image, the 'Clean Training Principle' for optimal noise-free training,
and CNN interpolation for continuous value predictions from categorical data.
It also offers insights into hyperparameter optimization and ML system software
engineering.
( 2
min )
Personalized Federated Learning (PFL) relies on collective data knowledge to
build customized models. However, non-IID data between clients poses
significant challenges, as collaborating with clients who have diverse data
distributions can harm local model performance, especially with limited
training data. To address this issue, we propose FedACS, a new PFL algorithm
with an Attention-based Client Selection mechanism. FedACS integrates an
attention mechanism to enhance collaboration among clients with similar data
distributions and mitigate the data scarcity issue. It prioritizes and
allocates resources based on data similarity. We further establish the
theoretical convergence behavior of FedACS. Experiments on CIFAR10 and FMNIST
validate FedACS's superiority, showcasing its potential to advance personalized
federated learning. By tackling non-IID data challenges and data scarcity,
FedACS offers promising advances in the field of personalized federated
learning.
( 2
min )
Semi-supervised learning (SSL) approaches have been successfully applied in a
wide range of engineering and scientific fields. This paper investigates the
generative model framework with a missingness mechanism for unclassified
observations, as introduced by Ahfock and McLachlan(2020). We show that in a
partially classified sample, a classifier using Bayes rule of allocation with a
missing-data mechanism can surpass a fully supervised classifier in a two-class
normal homoscedastic model, especially with moderate to low overlap and
proportion of missing class labels, or with large overlap but few missing
labels. It also outperforms a classifier with no missing-data mechanism
regardless of the overlap region or the proportion of missing class labels. Our
exploration of two- and three-component normal mixture models with unequal
covariances through simulations further corroborates our findings. Finally, we
illustrate the use of the proposed classifier with a missing-data mechanism on
interneuronal and skin lesion datasets.
( 2
min )
The Davis-Kahan-Wedin $\sin \Theta$ theorem describes how the singular
subspaces of a matrix change when subjected to a small perturbation. This
classic result is sharp in the worst case scenario. In this paper, we prove a
stochastic version of the Davis-Kahan-Wedin $\sin \Theta$ theorem when the
perturbation is a Gaussian random matrix. Under certain structural assumptions,
we obtain an optimal bound that significantly improves upon the classic
Davis-Kahan-Wedin $\sin \Theta$ theorem. One of our key tools is a new
perturbation bound for the singular values, which may be of independent
interest.
( 2
min )
This paper presents a systematic literature review (SLR) on the
explainability and interpretability of machine learning (ML) models within the
context of predictive process mining, using the PRISMA framework. Given the
rapid advancement of artificial intelligence (AI) and ML systems, understanding
the "black-box" nature of these technologies has become increasingly critical.
Focusing specifically on the domain of process mining, this paper delves into
the challenges of interpreting ML models trained with complex business process
data. We differentiate between intrinsically interpretable models and those
that require post-hoc explanation techniques, providing a comprehensive
overview of the current methodologies and their applications across various
application domains. Through a rigorous bibliographic analysis, this research
offers a detailed synthesis of the state of explainability and interpretability
in predictive process mining, identifying key trends, challenges, and future
directions. Our findings aim to equip researchers and practitioners with a
deeper understanding of how to develop and implement more trustworthy,
transparent, and effective intelligent systems for predictive process
analytics.
( 2
min )
In many real-world problems, there is a limited set of training data, but an
abundance of unlabeled data. We propose a new method, Generative Posterior
Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in
high-dimensional problems. A GPN is a generative model that, given a prior
distribution over functions, approximates the posterior distribution directly
by regularizing the network towards samples from the prior. We prove
theoretically that our method indeed approximates the Bayesian posterior and
show empirically that it improves epistemic uncertainty estimation and
scalability over competing methods.
( 2
min )
It’s not technology advancements that are the game-changers. The game-changer is how those technological advancements are leveraged to economically transform industries and society. 2024 is going to be a big year, especially in the realm of Artificial Intelligence (AI). Generative AI (GenAI) has lit a fire under organizations that suddenly have a senior management and… Read More »GenAI: Beware the Productivity Trap; It’s About Economics – Part 1
The post GenAI: Beware the Productivity Trap; It’s About Economics – Part 1 appeared first on Data Science Central.
( 22
min )
We propose StyleCap, a method to generate natural language descriptions of
speaking styles appearing in speech. Although most of conventional techniques
for para-/non-linguistic information recognition focus on the category
classification or the intensity estimation of pre-defined labels, they cannot
provide the reasoning of the recognition result in an interpretable manner.
StyleCap is a first step towards an end-to-end method for generating
speaking-style prompts from speech, i.e., automatic speaking-style captioning.
StyleCap is trained with paired data of speech and natural language
descriptions. We train neural networks that convert a speech representation
vector into prefix vectors that are fed into a large language model (LLM)-based
text decoder. We explore an appropriate text decoder and speech feature
representation suitable for this new task. The experimental results demonstrate
that our StyleCap leveraging richer LLMs for the text decoder, speech
self-supervised learning (SSL) features, and sentence rephrasing augmentation
improves the accuracy and diversity of generated speaking-style captions.
Samples of speaking-style captions generated by our StyleCap are publicly
available.
( 2
min )
In this note, we consider the highly nonconvex optimization problem
associated with computing the rank decomposition of symmetric tensors. We
formulate the invariance properties of the loss function and show that critical
points detected by standard gradient based methods are \emph{symmetry breaking}
with respect to the target tensor. The phenomena, seen for different choices of
target tensors and norms, make possible the use of recently developed analytic
and algebraic tools for studying nonconvex optimization landscapes exhibiting
symmetry breaking phenomena of similar nature.
( 2
min )
Despite the breakthroughs in biomarker discovery facilitated by differential
gene analysis, challenges remain, particularly at the single-cell level.
Traditional methodologies heavily rely on user-supplied cell annotations,
focusing on individually expressed data, often neglecting the critical
interactions between biological conditions, such as healthy versus diseased
states. In response, here we introduce scBeacon, an innovative framework built
upon a deep contrastive siamese network. scBeacon pioneers an unsupervised
approach, adeptly identifying matched cell populations across varied
conditions, enabling a refined differential gene analysis. By utilizing a
VQ-VAE framework, a contrastive siamese network, and a greedy iterative
strategy, scBeacon effectively pinpoints differential genes that hold potential
as key biomarkers. Comprehensive evaluations on a diverse array of datasets
validate scBeacon's superiority over existing single-cell differential gene
analysis tools. Its precision and adaptability underscore its significant role
in enhancing diagnostic accuracy in biomarker discovery. With the emphasis on
the importance of biomarkers in diagnosis, scBeacon is positioned to be a
pivotal asset in the evolution of personalized medicine and targeted
treatments.
( 2
min )
Federated bilevel optimization (FBO) has shown great potential recently in
machine learning and edge computing due to the emerging nested optimization
structure in meta-learning, fine-tuning, hyperparameter tuning, etc. However,
existing FBO algorithms often involve complicated computations and require
multiple sub-loops per iteration, each of which contains a number of
communication rounds. In this paper, we propose a simple and flexible FBO
framework named SimFBO, which is easy to implement without sub-loops, and
includes a generalized server-side aggregation and update for improving
communication efficiency. We further propose System-level heterogeneity robust
FBO (ShroFBO) as a variant of SimFBO with stronger resilience to heterogeneous
local computation. We show that SimFBO and ShroFBO provably achieve a linear
convergence speedup with partial client participation and client sampling
without replacement, as well as improved sample and communication complexities.
Experiments demonstrate the effectiveness of the proposed methods over existing
FBO algorithms.
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of probabilistic clinical models.
( 2
min )
We present a new high-level synthesis methodology for using large language
model tools to generate hardware designs. The methodology uses exclusively
open-source tools excluding the large language model. As a case study, we use
our methodology to generate a permuted congruential random number generator
design with a wishbone interface. We verify the functionality and quality of
the random number generator design using large language model-generated
simulations and the Dieharder randomness test suite. We document all the large
language model chat logs, Python scripts, Verilog scripts, and simulation
results used in the case study. We believe that our method of hardware design
generation coupled with the open source silicon 130 nm design tools will
revolutionize application-specific integrated circuit design. Our methodology
significantly lowers the bar to entry when building domain-specific computing
accelerators for the Internet of Things and proof of concept prototypes for
later fabrication in more modern process nodes.
( 2
min )
To benefit from the modeling capacity of deep models in system
identification, without worrying about inference time, this study presents a
novel training strategy that uses deep models only at the training stage. For
this purpose two separate models with different structures and goals are
employed. The first one is a deep generative model aiming at modeling the
distribution of system output(s), called the teacher model, and the second one
is a shallow basis function model, named the student model, fed by system
input(s) to predict the system output(s). That means these isolated paths must
reach the same ultimate target. As deep models show a great performance in
modeling of highly nonlinear systems, aligning the representation space learned
by these two models make the student model to inherit the approximation power
of the teacher model. The proposed objective function consists of the objective
of each student and teacher model adding up with a distance penalty between the
learned latent representations. The simulation results on three nonlinear
benchmarks show a comparative performance with examined deep architectures
applied on the same benchmarks. Algorithmic transparency and structure
efficiency are also achieved as byproducts.
( 3
min )
This report summarizes the 4th International Verification of Neural Networks
Competition (VNN-COMP 2023), held as a part of the 6th Workshop on Formal
Methods for ML-Enabled Autonomous Systems (FoMLAS), that was collocated with
the 35th International Conference on Computer-Aided Verification (CAV).
VNN-COMP is held annually to facilitate the fair and objective comparison of
state-of-the-art neural network verification tools, encourage the
standardization of tool interfaces, and bring together the neural network
verification community. To this end, standardized formats for networks (ONNX)
and specification (VNN-LIB) were defined, tools were evaluated on equal-cost
hardware (using an automatic evaluation pipeline based on AWS instances), and
tool parameters were chosen by the participants before the final test sets were
made public. In the 2023 iteration, 7 teams participated on a diverse set of 10
scored and 4 unscored benchmarks. This report summarizes the rules, benchmarks,
participating tools, results, and lessons learned from this iteration of this
competition.
( 2
min )
Though there has been substantial progress in developing quantum algorithms
to study classical datasets, the cost of simply \textit{loading} classical data
is an obstacle to quantum advantage. When the amplitude encoding is used,
loading an arbitrary classical vector requires up to exponential circuit depths
with respect to the number of qubits. Here, we address this ``input problem''
with two contributions. First, we introduce a circuit compilation method based
on tensor network (TN) theory. Our method -- AMLET (Automatic Multi-layer
Loader Exploiting TNs) -- proceeds via careful construction of a specific TN
topology and can be tailored to arbitrary circuit depths. Second, we perform
numerical experiments on real-world classical data from four distinct areas:
finance, images, fluid mechanics, and proteins. To the best of our knowledge,
this is the broadest numerical analysis to date of loading classical data into
a quantum computer. The required circuit depths are often several orders of
magnitude lower than the exponentially-scaling general loading algorithm would
require. Besides introducing a more efficient loading algorithm, this work
demonstrates that many classical datasets are loadable in depths that are much
shorter than previously expected, which has positive implications for speeding
up classical workloads on quantum computers.
( 3
min )
We propose INFAMOUS-NeRF, an implicit morphable face model that introduces
hypernetworks to NeRF to improve the representation power in the presence of
many training subjects. At the same time, INFAMOUS-NeRF resolves the classic
hypernetwork tradeoff of representation power and editability by learning
semantically-aligned latent spaces despite the subject-specific models, all
without requiring a large pretrained model. INFAMOUS-NeRF further introduces a
novel constraint to improve NeRF rendering along the face boundary. Our
constraint can leverage photometric surface rendering and multi-view
supervision to guide surface color prediction and improve rendering near the
surface. Finally, we introduce a novel, loss-guided adaptive sampling method
for more effective NeRF training by reducing the sampling redundancy. We show
quantitatively and qualitatively that our method achieves higher representation
power than prior face modeling methods in both controlled and in-the-wild
settings. Code and models will be released upon publication.
( 2
min )
The use of Mixed-Integer Linear Programming (MILP) models to represent neural
networks with Rectified Linear Unit (ReLU) activations has become increasingly
widespread in the last decade. This has enabled the use of MILP technology to
test-or stress-their behavior, to adversarially improve their training, and to
embed them in optimization models leveraging their predictive power. Many of
these MILP models rely on activation bounds. That is, bounds on the input
values of each neuron. In this work, we explore the tradeoff between the
tightness of these bounds and the computational effort of solving the resulting
MILP models. We provide guidelines for implementing these models based on the
impact of network structure, regularization, and rounding.
( 2
min )
This study presents a novel approach to addressing the challenge of missing
data in multivariate time series, with a particular focus on the complexities
of healthcare data. Our Conditional Self-Attention Imputation (CSAI) model,
grounded in a transformer-based framework, introduces a conditional hidden
state initialization tailored to the intricacies of medical time series data.
This methodology diverges from traditional imputation techniques by
specifically targeting the imbalance in missing data distribution, a crucial
aspect often overlooked in healthcare datasets. By integrating advanced
knowledge embedding and a non-uniform masking strategy, CSAI adeptly adjusts to
the distinct patterns of missing data in Electronic Health Records (EHRs).
( 2
min )
The recycling of waste electrical and electronic equipment is an essential
tool in allowing for a circular economy, presenting the potential for
significant environmental and economic gain. However, traditional material
separation techniques, based on physical and chemical processes, require
substantial investment and do not apply to all cases. In this work, we
investigate using an image classification neural network as a potential means
to control an automated material separation process in treating smartphone
waste, acting as a more efficient, less costly, and more widely applicable
alternative to existing tools. We produced a dataset with 1,127 images of
pyrolyzed smartphone components, which was then used to train and assess a
VGG-16 image classification model. The model achieved 83.33% accuracy, lending
credence to the viability of using such a neural network in material
separation.
( 2
min )
Curriculum learning and imitation learning have been leveraged extensively in
the robotics domain. However, minimal research has been done on leveraging
these ideas on control tasks over highly stochastic time-series data. Here, we
theoretically and empirically explore these approaches in a representative
control task over complex time-series data. We implement the fundamental ideas
of curriculum learning via data augmentation, while imitation learning is
implemented via policy distillation from an oracle. Our findings reveal that
curriculum learning should be considered a novel direction in improving
control-task performance over complex time-series. Our ample random-seed
out-sample empirics and ablation studies are highly encouraging for curriculum
learning for time-series control. These findings are especially encouraging as
we tune all overlapping hyperparameters on the baseline -- giving an advantage
to the baseline. On the other hand, we find that imitation learning should be
used with caution.
( 2
min )
Personalized Federated Learning (PFL) relies on collective data knowledge to
build customized models. However, non-IID data between clients poses
significant challenges, as collaborating with clients who have diverse data
distributions can harm local model performance, especially with limited
training data. To address this issue, we propose FedACS, a new PFL algorithm
with an Attention-based Client Selection mechanism. FedACS integrates an
attention mechanism to enhance collaboration among clients with similar data
distributions and mitigate the data scarcity issue. It prioritizes and
allocates resources based on data similarity. We further establish the
theoretical convergence behavior of FedACS. Experiments on CIFAR10 and FMNIST
validate FedACS's superiority, showcasing its potential to advance personalized
federated learning. By tackling non-IID data challenges and data scarcity,
FedACS offers promising advances in the field of personalized federated
learning.
( 2
min )
Exploring whether Enriched Category Theory could provide the foundation of an
alternative approach to Machine Learning. This paper is the first to construct
and motivate a Machine Learning algorithm solely with Enriched Category Theory.
In order to supplement evidence that Category Theory can be used to motivate
robust and explainable algorithms, it is shown that a series of reasonable
assumptions about a dataset lead to the construction of the Nearest Neighbours
Algorithm. In particular, as an extension of the original dataset using
profunctors in the category of Lawvere metric spaces. This leads to a
definition of an Enriched Nearest Neighbours Algorithm, which consequently also
produces an enriched form of the Voronoi diagram. This paper is intended to be
accessible without any knowledge of Category Theory
( 2
min )
Particle-based Variational Inference (ParVI) methods approximate the target
distribution by iteratively evolving finite weighted particle systems. Recent
advances of ParVI methods reveal the benefits of accelerated position update
strategies and dynamic weight adjustment approaches. In this paper, we propose
the first ParVI framework that possesses both accelerated position update and
dynamical weight adjustment simultaneously, named the General Accelerated
Dynamic-Weight Particle-based Variational Inference (GAD-PVI) framework.
Generally, GAD-PVI simulates the semi-Hamiltonian gradient flow on a novel
Information-Fisher-Rao space, which yields an additional decrease on the local
functional dissipation. GAD-PVI is compatible with different dissimilarity
functionals and associated smoothing approaches under three information
metrics. Experiments on both synthetic and real-world data demonstrate the
faster convergence and reduced approximation error of GAD-PVI methods over the
state-of-the-art.
( 2
min )
Randomized smoothing is currently the state-of-the-art method that provides
certified robustness for deep neural networks. However, due to its excessively
conservative nature, this method of incomplete verification often cannot
achieve an adequate certified radius on real-world datasets. One way to obtain
a larger certified radius is to use an input-specific algorithm instead of
using a fixed Gaussian filter for all data points. Several methods based on
this idea have been proposed, but they either suffer from high computational
costs or gain marginal improvement in certified radius. In this work, we show
that by exploiting the quasiconvex problem structure, we can find the optimal
certified radii for most data points with slight computational overhead. This
observation leads to an efficient and effective input-specific randomized
smoothing algorithm. We conduct extensive experiments and empirical analysis on
CIFAR-10 and ImageNet. The results show that the proposed method significantly
enhances the certified radii with low computational overhead.
( 2
min )
We consider the optimization problem associated with fitting two-layer ReLU
networks with respect to the squared loss, where labels are assumed to be
generated by a target network. Focusing first on standard Gaussian inputs, we
show that the structure of spurious local minima detected by stochastic
gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of
symmetry} with respect to the target weights. A closer look at the analysis
indicates that this principle of least symmetry breaking may apply to a broader
range of settings. Motivated by this, we conduct a series of experiments which
corroborate this hypothesis for different classes of non-isotropic non-product
distributions, smooth activation functions and networks with a few layers.
( 2
min )
Inverse reinforcement learning (IRL) usually assumes the model of the reward
function is pre-specified and estimates the parameter only. However, how to
determine a proper reward model is nontrivial. A simplistic model is less
likely to contain the real reward function, while a model with high complexity
leads to substantial computation cost and risks overfitting. This paper
addresses this trade-off in IRL model selection by introducing the structural
risk minimization (SRM) method from statistical learning. SRM selects an
optimal reward function class from a hypothesis set minimizing both estimation
error and model complexity. To formulate an SRM scheme for IRL, we estimate
policy gradient by demonstration serving as empirical risk and establish the
upper bound of Rademacher complexity of hypothesis classes as model penalty.
The learning guarantee is further presented. In particular, we provide explicit
SRM for the common linear weighted sum setting in IRL. Simulations demonstrate
the performance and efficiency of our scheme.
( 2
min )
We present convincing empirical results on the application of Randomized
Signature Methods for non-linear, non-parametric drift estimation for a
multi-variate financial market. Even though drift estimation is notoriously ill
defined due to small signal to noise ratio, one can still try to learn optimal
non-linear maps from data to future returns for the purposes of portfolio
optimization. Randomized Signatures, in contrast to classical signatures, allow
for high dimensional market dimension and provide features on the same scale.
We do not contribute to the theory of Randomized Signatures here, but rather
present our empirical findings on portfolio selection in real world settings
including real market data and transaction costs.
( 2
min )
Deep learning algorithms, especially Transformer-based models, have achieved
significant performance by capturing long-range dependencies and historical
information. However, the power of convolution has not been fully investigated.
Moreover, most existing works ignore the dynamic interaction among variables
and evolutionary noise in series. Addressing these issues, we propose a
Hierarchical Memorizing Network (HMNet). In particular, a hierarchical
convolution structure is introduced to extract the information from the series
at various scales. Besides, we propose a dynamic variable interaction module to
learn the varying correlation and an adaptive denoising module to search and
exploit similar patterns to alleviate noises. These modules can cooperate with
the hierarchical structure from the perspective of fine to coarse grain.
Experiments on five benchmarks demonstrate that HMNet significantly outperforms
the state-of-the-art models by 10.6% on MSE and 5.7% on MAE. Our code is
released at https://github.com/yzhHoward/HMNet.
( 2
min )
In this paper, we present XuanCe, a comprehensive and unified deep
reinforcement learning (DRL) library designed to be compatible with PyTorch,
TensorFlow, and MindSpore. XuanCe offers a wide range of functionalities,
including over 40 classical DRL and multi-agent DRL algorithms, with the
flexibility to easily incorporate new algorithms and environments. It is a
versatile DRL library that supports CPU, GPU, and Ascend, and can be executed
on various operating systems such as Ubuntu, Windows, MacOS, and EulerOS.
Extensive benchmarks conducted on popular environments including MuJoCo, Atari,
and StarCraftII multi-agent challenge demonstrate the library's impressive
performance. XuanCe is open-source and can be accessed at
https://github.com/agi-brain/xuance.git.
( 2
min )
Recent work found high mutual information between the learned representations
of large language models (LLMs) and the geospatial property of its input,
hinting an emergent internal model of space. However, whether this internal
space model has any causal effects on the LLMs' behaviors was not answered by
that work, led to criticism of these findings as mere statistical correlation.
Our study focused on uncovering the causality of the spatial representations in
LLMs. In particular, we discovered the potential spatial representations in
DeBERTa, GPT-Neo using representational similarity analysis and linear and
non-linear probing. Our casual intervention experiments showed that the spatial
representations influenced the model's performance on next word prediction and
a downstream task that relies on geospatial information. Our experiments
suggested that the LLMs learn and use an internal model of space in solving
geospatial related tasks.
( 2
min )
Leveraging knowledge from multiple tasks through introducing a small number
of task specific parameters into each transformer layer, also known as
adapters, receives much attention recently. However, adding an extra fusion
layer to implement knowledge composition not only increases the inference time
but also is non-scalable for some applications. To avoid these issues, we
propose a two-stage knowledge distillation algorithm called
AdapterDistillation. In the first stage, we extract task specific knowledge by
using local data to train a student adapter. In the second stage, we distill
the knowledge from the existing teacher adapters into the student adapter to
help its inference. Extensive experiments on frequently asked question
retrieval in task-oriented dialog systems validate the efficiency of
AdapterDistillation. We show that AdapterDistillation outperforms existing
algorithms in terms of accuracy, resource consumption and inference time.
( 2
min )
This paper analyses LightGCN in the context of graph recommendation
algorithms. Despite the initial design of Graph Convolutional Networks for
graph classification, the non-linear operations are not always essential.
LightGCN enables linear propagation of embeddings, enhancing performance. We
reproduce the original findings, assess LightGCN's robustness on diverse
datasets and metrics, and explore Graph Diffusion as an augmentation of signal
propagation in LightGCN.
( 2
min )
We present Mini-BEHAVIOR, a novel benchmark for embodied AI that challenges
agents to use reasoning and decision-making skills to solve complex activities
that resemble everyday human challenges. The Mini-BEHAVIOR environment is a
fast, realistic Gridworld environment that offers the benefits of rapid
prototyping and ease of use while preserving a symbolic level of physical
realism and complexity found in complex embodied AI benchmarks. We introduce
key features such as procedural generation, to enable the creation of countless
task variations and support open-ended learning. Mini-BEHAVIOR provides
implementations of various household tasks from the original BEHAVIOR
benchmark, along with starter code for data collection and reinforcement
learning agent training. In essence, Mini-BEHAVIOR offers a fast, open-ended
benchmark for evaluating decision-making and planning solutions in embodied AI.
It serves as a user-friendly entry point for research and facilitates the
evaluation and development of solutions, simplifying their assessment and
development while advancing the field of embodied AI. Code is publicly
available at https://github.com/StanfordVL/mini_behavior.
( 2
min )
In this paper, we propose the use of self-supervised pretraining on a large
unlabelled data set to improve the performance of a personalized voice activity
detection (VAD) model in adverse conditions. We pretrain a long short-term
memory (LSTM)-encoder using the autoregressive predictive coding (APC)
framework and fine-tune it for personalized VAD. We also propose a denoising
variant of APC, with the goal of improving the robustness of personalized VAD.
The trained models are systematically evaluated on both clean speech and speech
contaminated by various types of noise at different SNR-levels and compared to
a purely supervised model. Our experiments show that self-supervised
pretraining not only improves performance in clean conditions, but also yields
models which are more robust to adverse conditions compared to purely
supervised learning.
( 2
min )
We present a comprehensive solution to learn and improve text-to-image models
from human preference feedback. To begin with, we build ImageReward -- the
first general-purpose text-to-image human preference reward model -- to
effectively encode human preferences. Its training is based on our systematic
annotation pipeline including rating and ranking, which collects 137k expert
comparisons to date. In human evaluation, ImageReward outperforms existing
scoring models and metrics, making it a promising automatic metric for
evaluating text-to-image synthesis. On top of it, we propose Reward Feedback
Learning (ReFL), a direct tuning algorithm to optimize diffusion models against
a scorer. Both automatic and human evaluation support ReFL's advantages over
compared methods. All code and datasets are provided at
\url{https://github.com/THUDM/ImageReward}.
( 2
min )
Self-supervised learning (SSL) in audio holds significant potential across
various domains, particularly in situations where abundant, unlabeled data is
readily available at no cost. This is particularly pertinent in bioacoustics,
where biologists routinely collect extensive sound datasets from the natural
environment. In this study, we demonstrate that SSL is capable of acquiring
meaningful representations of bird sounds from audio recordings without the
need for annotations. Our experiments showcase that these learned
representations exhibit the capacity to generalize to new bird species in
few-shot learning (FSL) scenarios. Additionally, we show that selecting windows
with high bird activation for self-supervised learning, using a pretrained
audio neural network, significantly enhances the quality of the learned
representations.
( 2
min )
Foundation models, specifically Large Language Models (LLM's), have lately
gained wide-spread attention and adoption. Reinforcement Learning with Human
Feedback (RLHF) involves training a reward model to capture desired behaviors,
which is then used to align LLM's. These reward models are additionally used at
inference-time to estimate LLM responses' adherence to those desired behaviors.
However, there is little work measuring how robust these reward models are to
distribution shifts. In this work, we evaluate how reward model performance -
measured via accuracy and calibration (i.e. alignment between accuracy and
confidence) - is affected by distribution shift. We show novel calibration
patterns and accuracy drops due to OOD prompts and responses, and that the
reward model is more sensitive to shifts in responses than prompts.
Additionally, we adapt an OOD detection technique commonly used in
classification to the reward model setting to detect these distribution shifts
in prompts and responses.
( 2
min )
We explore the possibility of fully replacing a plasma physics kinetic
simulator with a graph neural network-based simulator. We focus on this class
of surrogate models given the similarity between their message-passing update
mechanism and the traditional physics solver update, and the possibility of
enforcing known physical priors into the graph construction and update. We show
that our model learns the kinetic plasma dynamics of the one-dimensional plasma
model, a predecessor of contemporary kinetic plasma simulation codes, and
recovers a wide range of well-known kinetic plasma processes, including plasma
thermalization, electrostatic fluctuations about thermal equilibrium, and the
drag on a fast sheet and Landau damping. We compare the performance against the
original plasma model in terms of run-time, conservation laws, and temporal
evolution of key physical quantities. The limitations of the model are
presented and possible directions for higher-dimensional surrogate models for
kinetic plasmas are discussed.
( 2
min )
Three-dimensional native states of natural proteins display recurring and
hierarchical patterns. Yet, traditional graph-based modeling of protein
structures is often limited to operate within a single fine-grained resolution,
and lacks hourglass neural architectures to learn those high-level building
blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant
coarse-graining model that efficiently operates on all-atom protein structures.
Our model departs from current approaches that employ graph modeling, instead
focusing on local convolutional coarsening to model sequence-motif interactions
with efficient time complexity in protein length. We measure the reconstruction
capabilities of Ophiuchus across different compression rates, and compare it to
existing models. We examine the learned latent space and demonstrate its
utility through conformational interpolation. Finally, we leverage denoising
diffusion probabilistic models (DDPM) in the latent space to efficiently sample
protein structures. Our experiments demonstrate Ophiuchus to be a scalable
basis for efficient protein modeling and generation.
( 2
min )
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system
that allows control over speaker identity using natural language descriptions.
To control speaker identity within the prompt-based TTS framework, we introduce
the concept of speaker prompt, which describes voice characteristics (e.g.,
gender-neutral, young, old, and muffled) designed to be approximately
independent of speaking style. Since there is no large-scale dataset containing
speaker prompts, we first construct a dataset based on the LibriTTS-R corpus
with manually annotated speaker prompts. We then employ a diffusion-based
acoustic model with mixture density networks to model diverse speaker factors
in the training data. Unlike previous studies that rely on style prompts
describing only a limited aspect of speaker individuality, such as pitch,
speaking speed, and energy, our method utilizes an additional speaker prompt to
effectively learn the mapping from natural language descriptions to the
acoustic features of diverse speakers. Our subjective evaluation results show
that the proposed method can better control speaker characteristics than the
methods without the speaker prompt. Audio samples are available at
https://reppy4620.github.io/demo.promptttspp/.
( 2
min )
Model-based sequential approaches to discrete "black-box" optimization,
including Bayesian optimization techniques, often access the same points
multiple times for a given objective function in interest, resulting in many
steps to find the global optimum. Here, we numerically study the effect of a
postprocessing method on Bayesian optimization that strictly prohibits
duplicated samples in the dataset. We find the postprocessing method
significantly reduces the number of sequential steps to find the global
optimum, especially when the acquisition function is of maximum a posterior
estimation. Our results provide a simple but general strategy to solve the slow
convergence of Bayesian optimization for high-dimensional problems.
( 2
min )
We propose to enhance the training of physics-informed neural networks
(PINNs). To this aim, we introduce nonlinear additive and multiplicative
preconditioning strategies for the widely used L-BFGS optimizer. The nonlinear
preconditioners are constructed by utilizing the Schwarz domain-decomposition
framework, where the parameters of the network are decomposed in a layer-wise
manner. Through a series of numerical experiments, we demonstrate that both,
additive and multiplicative preconditioners significantly improve the
convergence of the standard L-BFGS optimizer, while providing more accurate
solutions of the underlying partial differential equations. Moreover, the
additive preconditioner is inherently parallel, thus giving rise to a novel
approach to model parallelism.
( 2
min )
The training process of ReLU neural networks often exhibits complicated
nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose
significant challenges for theoretical analysis. Therefore, most previous
theoretical works on the optimization dynamics of neural networks focus either
on local analysis (like the end of training) or approximate linear models (like
Neural Tangent Kernel). In this work, we conduct a complete theoretical
characterization of the training process of a two-layer ReLU network trained by
Gradient Flow on a linearly separable data. In this specific setting, our
analysis captures the whole optimization process starting from random
initialization to final convergence. Despite the relatively simple model and
data that we studied, we reveal four different phases from the whole training
process showing a general simplifying-to-complicating learning trend. Specific
nonlinear behaviors can also be precisely identified and captured
theoretically, such as initial condensation, saddle-to-plateau dynamics,
plateau escape, changes of activation patterns, learning with increasing
complexity, etc.
( 2
min )
We propose a new method to estimate a root-directed spanning tree from
extreme data. A prominent example is a river network, to be discovered from
extreme flow measured at a set of stations. Our new algorithm utilizes
qualitative aspects of a max-linear Bayesian network, which has been designed
for modelling causality in extremes. The algorithm estimates bivariate scores
and returns a root-directed spanning tree. It performs extremely well on
benchmark data and new data. We prove that the new estimator is consistent
under a max-linear Bayesian network model with noise. We also assess its
strengths and limitations in a small simulation study.
( 2
min )
We present a short tutorial on to the use of the R gasper package. Gasper is
a package dedicated to signal processing on graphs. It also provides an
interface to the SuiteSparse Matrix Collection.
( 2
min )
This paper studies experimental designs for estimation and inference on
policies with spillover effects. Units are organized into a finite number of
large clusters and interact in unknown ways within each cluster. First, we
introduce a single-wave experiment that, by varying the randomization across
cluster pairs, estimates the marginal effect of a change in treatment
probabilities, taking spillover effects into account. Using the marginal
effect, we propose a test for policy optimality. Second, we design a
multiple-wave experiment to estimate welfare-maximizing treatment rules. We
provide strong theoretical guarantees and an implementation in a large-scale
field experiment.
( 2
min )
The use of transfer learning with deep neural networks has increasingly
become widespread for deploying well-tested computer vision systems to newer
domains, especially those with limited datasets. We describe a transfer
learning use case for a domain with a data-starved regime, having fewer than
100 labeled target samples. We evaluate the effectiveness of convolutional
feature extraction and fine-tuning of overparameterized models with respect to
the size of target training data, as well as their generalization performance
on data with covariate shift, or out-of-distribution (OOD) data. Our
experiments demonstrate that both overparameterization and feature reuse
contribute to the successful application of transfer learning in training image
classifiers in data-starved regimes. We provide visual explanations to support
our findings and conclude that transfer learning enhances the performance of
CNN architectures in data-starved regimes.
( 2
min )
These lecture notes give a statistical perspective on the foundations of
reinforcement learning and interactive decision making. We present a unifying
framework for addressing the exploration-exploitation dilemma using frequentist
and Bayesian approaches, with connections and parallels between supervised
learning/estimation and decision making as an overarching theme. Special
attention is paid to function approximation and flexible model classes such as
neural networks. Topics covered include multi-armed and contextual bandits,
structured bandits, and reinforcement learning with high-dimensional feedback.
( 2
min )
This paper introduces novel alternate training procedures for hard-parameter
sharing Multi-Task Neural Networks (MTNNs). Traditional MTNN training faces
challenges in managing conflicting loss gradients, often yielding sub-optimal
performance. The proposed alternate training method updates shared and
task-specific weights alternately, exploiting the multi-head architecture of
the model. This approach reduces computational costs, enhances training
regularization, and improves generalization. Convergence properties similar to
those of the classical stochastic gradient method are established. Empirical
experiments demonstrate delayed overfitting, improved prediction, and reduced
computational demands. In summary, our alternate training procedures offer a
promising advancement for the training of hard-parameter sharing MTNNs.
( 2
min )
We study the problem of learning linear temporal logic (LTL) formulas from
examples, as a first step towards expressing a property separating positive and
negative instances in a way that is comprehensible for humans. In this paper we
initiate the study of the computational complexity of the problem. Our main
results are hardness results: we show that the LTL learning problem is
NP-complete, both for the full logic and for almost all of its fragments. This
motivates the search for efficient heuristics, and highlights the complexity of
expressing separating properties in concise natural language.
( 2
min )
Generalized Labeled Multi-Bernoulli (GLMB) densities arise in a host of
multi-object system applications analogous to Gaussians in single-object
filtering. However, computing the GLMB filtering density requires solving
NP-hard problems. To alleviate this computational bottleneck, we develop a
linear complexity Gibbs sampling framework for GLMB density computation.
Specifically, we propose a tempered Gibbs sampler that exploits the structure
of the GLMB filtering density to achieve an $\mathcal{O}(T(P+M))$ complexity,
where $T$ is the number of iterations of the algorithm, $P$ and $M$ are the
number hypothesized objects and measurements. This innovation enables the GLMB
filter implementation to be reduced from an $\mathcal{O}(TP^{2}M)$ complexity
to $\mathcal{O}(T(P+M+\log T)+PM)$. Moreover, the proposed framework provides
the flexibility for trade-offs between tracking performance and computational
load. Convergence of the proposed Gibbs sampler is established, and numerical
studies are presented to validate the proposed GLMB filter implementation.
( 2
min )
We introduce a new empirical Bayes approach for large-scale multiple linear
regression. Our approach combines two key ideas: (i) the use of flexible
"adaptive shrinkage" priors, which approximate the nonparametric family of
scale mixture of normal distributions by a finite mixture of normal
distributions; and (ii) the use of variational approximations to efficiently
estimate prior hyperparameters and compute approximate posteriors. Combining
these two ideas results in fast and flexible methods, with computational speed
comparable to fast penalized regression methods such as the Lasso, and with
superior prediction accuracy across a wide range of scenarios. Furthermore, we
show that the posterior mean from our method can be interpreted as solving a
penalized regression problem, with the precise form of the penalty function
being learned from the data by directly solving an optimization problem (rather
than being tuned by cross-validation). Our methods are implemented in an R
package, mr.ash.alpha, available from
https://github.com/stephenslab/mr.ash.alpha
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of probabilistic clinical models.
( 2
min )
Model-based sequential approaches to discrete "black-box" optimization,
including Bayesian optimization techniques, often access the same points
multiple times for a given objective function in interest, resulting in many
steps to find the global optimum. Here, we numerically study the effect of a
postprocessing method on Bayesian optimization that strictly prohibits
duplicated samples in the dataset. We find the postprocessing method
significantly reduces the number of sequential steps to find the global
optimum, especially when the acquisition function is of maximum a posterior
estimation. Our results provide a simple but general strategy to solve the slow
convergence of Bayesian optimization for high-dimensional problems.
( 2
min )
We propose a new method to estimate a root-directed spanning tree from
extreme data. A prominent example is a river network, to be discovered from
extreme flow measured at a set of stations. Our new algorithm utilizes
qualitative aspects of a max-linear Bayesian network, which has been designed
for modelling causality in extremes. The algorithm estimates bivariate scores
and returns a root-directed spanning tree. It performs extremely well on
benchmark data and new data. We prove that the new estimator is consistent
under a max-linear Bayesian network model with noise. We also assess its
strengths and limitations in a small simulation study.
( 2
min )
We consider the optimization problem associated with fitting two-layer ReLU
networks with respect to the squared loss, where labels are assumed to be
generated by a target network. Focusing first on standard Gaussian inputs, we
show that the structure of spurious local minima detected by stochastic
gradient descent (SGD) is, in a well-defined sense, the \emph{least loss of
symmetry} with respect to the target weights. A closer look at the analysis
indicates that this principle of least symmetry breaking may apply to a broader
range of settings. Motivated by this, we conduct a series of experiments which
corroborate this hypothesis for different classes of non-isotropic non-product
distributions, smooth activation functions and networks with a few layers.
( 2
min )
The use of transfer learning with deep neural networks has increasingly
become widespread for deploying well-tested computer vision systems to newer
domains, especially those with limited datasets. We describe a transfer
learning use case for a domain with a data-starved regime, having fewer than
100 labeled target samples. We evaluate the effectiveness of convolutional
feature extraction and fine-tuning of overparameterized models with respect to
the size of target training data, as well as their generalization performance
on data with covariate shift, or out-of-distribution (OOD) data. Our
experiments demonstrate that both overparameterization and feature reuse
contribute to the successful application of transfer learning in training image
classifiers in data-starved regimes. We provide visual explanations to support
our findings and conclude that transfer learning enhances the performance of
CNN architectures in data-starved regimes.
( 2
min )
One of the most recent and fascinating breakthroughs in artificial
intelligence is ChatGPT, a chatbot which can simulate human conversation.
ChatGPT is an instance of GPT4, which is a language model based on generative
gredictive gransformers. So if one wants to study from a theoretical point of
view, how powerful such artificial intelligence can be, one approach is to
consider transformer networks and to study which problems one can solve with
these networks theoretically. Here it is not only important what kind of models
these network can approximate, or how they can generalize their knowledge
learned by choosing the best possible approximation to a concrete data set, but
also how well optimization of such transformer network based on concrete data
set works. In this article we consider all these three different aspects
simultaneously and show a theoretical upper bound on the missclassification
probability of a transformer network fitted to the observed data. For
simplicity we focus in this context on transformer encoder networks which can
be applied to define an estimate in the context of a classification problem
involving natural language.
( 2
min )
We propose a new method called the N-particle underdamped Langevin algorithm
for optimizing a special class of non-linear functionals defined over the space
of probability measures. Examples of problems with this formulation include
training neural networks in the mean-field regime, density estimation, and
kernel Stein discrepancy minimization. Our algorithm is based on a novel
space-time discretization of the mean-field underdamped Langevin dynamics, for
which we provide a new, fast mixing guarantee. In addition, we demonstrate that
our algorithm converges globally in total variation distance, bridging the
theoretical gap between the dynamics and its practical implementation.
( 2
min )
These lecture notes give a statistical perspective on the foundations of
reinforcement learning and interactive decision making. We present a unifying
framework for addressing the exploration-exploitation dilemma using frequentist
and Bayesian approaches, with connections and parallels between supervised
learning/estimation and decision making as an overarching theme. Special
attention is paid to function approximation and flexible model classes such as
neural networks. Topics covered include multi-armed and contextual bandits,
structured bandits, and reinforcement learning with high-dimensional feedback.
( 2
min )
Whether abundant, endangered or extinct, animal species are the focus of countless AI-powered conservation projects. These initiatives — accelerated using NVIDIA GPUs, deep learning software and robotics technology — are alerting conservationists to poaching threats, powering more sustainable aquaculture and helping scientists monitor coral reef health. Take a safari through the NVIDIA Blog’s top animal Read article >
( 7
min )
Before ringing in the new year, GeForce NOW is taking a look back at a 2023 full of top-notch gaming. Explore GeForce NOW’s year in review, which brought more hit games, improved service features and the launch of the Ultimate membership tier. Plus, GFN Thursday is raising a toast to the GeForce NOW community by Read article >
( 7
min )
In this paper, we study asynchronous stochastic approximation algorithms
without communication delays. Our main contribution is a stability proof for
these algorithms that extends a method of Borkar and Meyn by accommodating more
general noise conditions. We also derive convergence results from this
stability result and discuss their application in important average-reward
reinforcement learning problems.
( 2
min )
Out-of-distribution (OOD) detection is an important topic for real-world
machine learning systems, but settings with limited in-distribution samples
have been underexplored. Such few-shot OOD settings are challenging, as models
have scarce opportunities to learn the data distribution before being tasked
with identifying OOD samples. Indeed, we demonstrate that recent
state-of-the-art OOD methods fail to outperform simple baselines in the
few-shot setting. We thus propose a hypernetwork framework called HyperMix,
using Mixup on the generated classifier parameters, as well as a natural
out-of-episode outlier exposure technique that does not require an additional
outlier dataset. We conduct experiments on CIFAR-FS and MiniImageNet,
significantly outperforming other OOD methods in the few-shot regime.
( 2
min )
Recent advancements in sensing and communication facilitate obtaining
high-frequency real-time data from various physical systems like power
networks, climate systems, biological networks, etc. However, since the data
are recorded by physical sensors, it is natural that the obtained data is
corrupted by measurement noise. In this paper, we present a novel algorithm for
online real-time learning of dynamical systems from noisy time-series data,
which employs the Robust Koopman operator framework to mitigate the effect of
measurement noise. The proposed algorithm has three main advantages: a) it
allows for online real-time monitoring of a dynamical system; b) it obtains a
linear representation of the underlying dynamical system, thus enabling the
user to use linear systems theory for analysis and control of the system; c) it
is computationally fast and less intensive than the popular Extended Dynamic
Mode Decomposition (EDMD) algorithm. We illustrate the efficiency of the
proposed algorithm by applying it to identify the Van der Pol oscillator, the
IEEE 68 bus system, and a ring network of Van der Pol oscillators.
( 2
min )
Many companies rely on APIs of managed AI models such as OpenAI's GPT-4 to
create AI-enabled experiences in their products. Along with the benefits of
ease of use and shortened time to production, this reliance on proprietary APIs
has downsides in terms of model control, performance reliability, up-time
predictability, and cost. At the same time, there has been a flurry of open
source small language models (SLMs) that have been made available for
commercial use. However, their readiness to replace existing capabilities
remains unclear, and a systematic approach to test these models is not readily
available. In this paper, we present a systematic evaluation methodology for,
and characterization of, modern open source SLMs and their trade-offs when
replacing a proprietary LLM APIs for a real-world product feature. We have
designed SLaM, an automated analysis tool that enables the quantitative and
qualitative testing of product features utilizing arbitrary SLMs. Using SLaM,
we examine both the quality and the performance characteristics of modern SLMs
relative to an existing customer-facing OpenAI-based implementation. We find
that across 9 SLMs and 29 variants, we observe competitive quality-of-results
for our use case, significant performance consistency improvement, and a cost
reduction of 5x-29x when compared to OpenAI GPT-4.
( 3
min )
In stochastic zeroth-order optimization, a problem of practical relevance is
understanding how to fully exploit the local geometry of the underlying
objective function. We consider a fundamental setting in which the objective
function is quadratic, and provide the first tight characterization of the
optimal Hessian-dependent sample complexity. Our contribution is twofold.
First, from an information-theoretic point of view, we prove tight lower bounds
on Hessian-dependent complexities by introducing a concept called energy
allocation, which captures the interaction between the searching algorithm and
the geometry of objective functions. A matching upper bound is obtained by
solving the optimal energy spectrum. Then, algorithmically, we show the
existence of a Hessian-independent algorithm that universally achieves the
asymptotic optimal sample complexities for all Hessian instances. The optimal
sample complexities achieved by our algorithm remain valid for heavy-tailed
noise distributions, which are enabled by a truncation method.
( 2
min )
This paper explores the image synthesis capabilities of GPT-4, a leading
multi-modal large language model. We establish a benchmark for evaluating the
fidelity of texture features in images generated by GPT-4, comprising manually
painted pictures and their AI-generated counterparts. The contributions of this
study are threefold: First, we provide an in-depth analysis of the fidelity of
image synthesis features based on GPT-4, marking the first such study on this
state-of-the-art model. Second, the quantitative and qualitative experiments
fully reveals the limitations of the GPT-4 model in image synthesis. Third, we
have compiled a unique benchmark of manual drawings and corresponding
GPT-4-generated images, introducing a new task to advance fidelity research in
AI-generated content (AIGC). The dataset is available at:
\url{https://github.com/rickwang28574/DeepArt}.
( 2
min )
This paper presents a Gaussian Process (GP) framework, a non-parametric
technique widely acknowledged for regression and classification tasks, to
address inverse problems in mean field games (MFGs). By leveraging GPs, we aim
to recover agents' strategic actions and the environment's configurations from
partial and noisy observations of the population of agents and the setup of the
environment. Our method is a probabilistic tool to infer the behaviors of
agents in MFGs from data in scenarios where the comprehensive dataset is either
inaccessible or contaminated by noises.
( 2
min )
We propose a simple multivariate normality test based on Kac-Bernstein's
characterization, which can be conducted by utilising existing statistical
independence tests for sums and differences of data samples. We also perform
its empirical investigation, which reveals that for high-dimensional data, the
proposed approach may be more efficient than the alternative ones. The
accompanying code repository is provided at \url{https://shorturl.at/rtuy5}.
( 2
min )
We explore the applications of random matrix theory (RMT) in the training of
deep neural networks (DNNs), focusing on layer pruning that is reducing the
number of DNN parameters (weights). Our numerical results show that this
pruning leads to a drastic reduction of parameters while not reducing the
accuracy of DNNs and CNNs. Moreover, pruning the fully connected DNNs actually
increases the accuracy and decreases the variance for random initializations.
Our numerics indicate that this enhancement in accuracy is due to the
simplification of the loss landscape. We next provide rigorous mathematical
underpinning of these numerical results by proving the RMT-based Pruning
Theorem. Our results offer valuable insights into the practical application of
RMT for the creation of more efficient and accurate deep-learning models.
( 2
min )
This paper proposes an efficient optimizer called AdaPlus which integrates
Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus
combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does
not introduce any extra hyper-parameters. We perform extensive experimental
evaluations on three machine learning tasks to validate the effectiveness of
AdaPlus. The experiment results validate that AdaPlus (i) among all the
evaluated adaptive methods, performs most comparable with (even slightly better
than) SGD with momentum on image classification tasks and (ii) outperforms
other state-of-the-art optimizers on language modeling tasks and illustrates
pretty high stability when training GANs. The experiment code of AdaPlus will
be accessible at: https://github.com/guanleics/AdaPlus.
( 2
min )
The growth of network-connected devices has led to an exponential increase in
data generation, creating significant challenges for efficient data analysis.
This data is generated continuously, creating a dynamic flow known as a data
stream. The characteristics of a data stream may change dynamically, and this
change is known as concept drift. Consequently, a method for handling data
streams must efficiently reduce their volume while dynamically adapting to
these changing characteristics. This paper proposes a simple online vector
quantization method for concept drift. The proposed method identifies and
replaces units with low win probability through remove-birth updating, thus
achieving a rapid adaptation to concept drift. Furthermore, the results of this
study show that the proposed method can generate minimal dead units even in the
presence of concept drift. This study also suggests that some metrics
calculated from the proposed method will be helpful for drift detection.
( 2
min )
Multi-query attention (MQA), which only uses a single key-value head,
drastically speeds up decoder inference. However, MQA can lead to quality
degradation, and moreover it may not be desirable to train a separate model
just for faster inference. We (1) propose a recipe for uptraining existing
multi-head language model checkpoints into models with MQA using 5% of original
pre-training compute, and (2) introduce grouped-query attention (GQA), a
generalization of multi-query attention which uses an intermediate (more than
one, less than number of query heads) number of key-value heads. We show that
uptrained GQA achieves quality close to multi-head attention with comparable
speed to MQA.
( 2
min )
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an
optimal solution for adversarial multi-armed bandit (MAB) problems. However,
most of the existing complexity results for INF rely on restrictive
assumptions, such as bounded rewards. Recently, a related algorithm was
proposed that works for both adversarial and stochastic heavy-tailed MAB
settings. However, this algorithm fails to fully exploit the available data.
In this paper, we propose a new version of INF called the Implicitly
Normalized Forecaster with clipping (INF-clip) for MAB problems with
heavy-tailed reward distributions. We establish convergence results under mild
assumptions on the rewards distribution and demonstrate that INF-clip is
optimal for linear heavy-tailed stochastic MAB problems and works well for
non-linear ones. Furthermore, we show that INF-clip outperforms the
best-of-both-worlds algorithm in cases where it is difficult to distinguish
between different arms.
( 2
min )
We study the consistency of surrogate risks for robust binary classification.
It is common to learn robust classifiers by adversarial training, which seeks
to minimize the expected $0$-$1$ loss when each example can be maliciously
corrupted within a small ball. We give a simple and complete characterization
of the set of surrogate loss functions that are \emph{consistent}, i.e., that
can replace the $0$-$1$ loss without affecting the minimizing sequences of the
original adversarial risk, for any data distribution. We also prove a
quantitative version of adversarial consistency for the $\rho$-margin loss. Our
results reveal that the class of adversarially consistent surrogates is
substantially smaller than in the standard setting, where many common
surrogates are known to be consistent.
( 2
min )
The current trend in developing machine learning models for reading
comprehension and logical reasoning tasks is focused on improving the models'
abilities to understand and utilize logical rules. This work focuses on
providing a novel loss function and accompanying model architecture that has
more interpretable components than some other models by representing a common
strategy employed by humans when given reading comprehension and logical
reasoning tasks. Our strategy involves emphasizing relative accuracy over
absolute accuracy and can theoretically produce the correct answer with
incomplete knowledge. We examine the effectiveness of this strategy to solve
reading comprehension and logical reasoning questions. The models were
evaluated on the ReClor dataset, a challenging reading comprehension and
logical reasoning benchmark. We propose the polytuplet loss function, which
forces prioritization of learning the relative correctness of answer choices
over learning the true accuracy of each choice. Our results indicate that
models employing polytuplet loss outperform existing baseline models, though
further research is required to quantify the benefits it may present.
( 2
min )
We introduce a new approach for generating sequences of implied volatility
(IV) surfaces across multiple assets that is faithful to historical prices. We
do so using a combination of functional data analysis and neural stochastic
differential equations (SDEs) combined with a probability integral transform
penalty to reduce model misspecification. We demonstrate that learning the
joint dynamics of IV surfaces and prices produces market scenarios that are
consistent with historical features and lie within the sub-manifold of surfaces
that are essentially free of static arbitrage. Finally, we demonstrate that
delta hedging using the simulated surfaces generates profit and loss (P&L)
distributions that are consistent with realised P&Ls.
( 2
min )
Arunachalam and de Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
In nonstationary bandit learning problems, the decision-maker must
continually gather information and adapt their action selection as the latent
state of the environment evolves. In each time period, some latent optimal
action maximizes expected reward under the environment state. We view the
optimal action sequence as a stochastic process, and take an
information-theoretic approach to analyze attainable performance. We bound
limiting per-period regret in terms of the entropy rate of the optimal action
process. The bound applies to a wide array of problems studied in the
literature and reflects the problem's information structure through its
information-ratio.
( 2
min )
Federated Learning (FL) and Split Learning (SL) are two popular paradigms of
distributed machine learning. By offloading the computation-intensive portions
to the server, SL is promising for deep model training on resource-constrained
devices, yet still lacking of rigorous convergence analysis. In this paper, we
derive the convergence guarantees of Sequential SL (SSL, the vanilla case of SL
that conducts the model training in sequence) for strongly/general/non-convex
objectives on heterogeneous data. Notably, the derived guarantees suggest that
SSL is better than Federated Averaging (FedAvg, the most popular algorithm in
FL) on heterogeneous data. We validate the counterintuitive analysis result
empirically on extremely heterogeneous data.
( 2
min )
We study stochastic delayed feedback in general multi-agent sequential
decision making, which includes bandits, single-agent Markov decision processes
(MDPs), and Markov games (MGs). We propose a novel reduction-based framework,
which turns any multi-batched algorithm for sequential decision making with
instantaneous feedback into a sample-efficient algorithm that can handle
stochastic delays in sequential decision making. By plugging different
multi-batched algorithms into our framework, we provide several examples
demonstrating that our framework not only matches or improves existing results
for bandits, tabular MDPs, and tabular MGs, but also provides the first line of
studies on delays in sequential decision making with function approximation. In
summary, we provide a complete set of sharp results for multi-agent sequential
decision making with delayed feedback.
( 2
min )
Understanding the loss of information in spectral analytics is a crucial
first step towards finding root causes for failures and uncertainties using
spectral data in artificial intelligence models built from modern complex data
science applications. Here, we show from an elementary Shannon entropy model
analysis with quantum statistics of Gaussian distributed spectral data, that
the relative loss of information from dimensionality reduction due to the
projection of an initial five-dimensional dataset onto two-dimensional diagrams
is less than one percent in the parameter range of small data sets with sample
sizes on the order of few hundred data samples. From our analysis, we also
conclude that the density and expectation value of the entropy probability
distribution increases with the sample number and sample size using artificial
data models derived from random sampling Monte Carlo simulation methods.
( 2
min )
We present a convolutional framework which significantly reduces the
complexity and thus, the computational effort for distributed reinforcement
learning control of dynamical systems governed by partial differential
equations (PDEs). Exploiting translational invariances, the high-dimensional
distributed control problem can be transformed into a multi-agent control
problem with many identical, uncoupled agents. Furthermore, using the fact that
information is transported with finite velocity in many cases, the dimension of
the agents' environment can be drastically reduced using a convolution
operation over the state space of the PDE. In this setting, the complexity can
be flexibly adjusted via the kernel width or by using a stride greater than
one. Moreover, scaling from smaller to larger systems -- or the transfer
between different domains -- becomes a straightforward task requiring little
effort. We demonstrate the performance of the proposed framework using several
PDE examples with increasing complexity, where stabilization is achieved by
training a low-dimensional deep deterministic policy gradient agent using
minimal computing resources.
( 2
min )
Effective representation of molecules is a crucial factor affecting the
performance of artificial intelligence models. This study introduces a
flexible, fragment-based, multiscale molecular representation framework called
t-SMILES (tree-based SMILES) with three code algorithms: TSSA (t-SMILES with
Shared Atom), TSDY (t-SMILES with Dummy Atom) and TSID (t-SMILES with ID). It
describes molecules using SMILES-type strings obtained by performing a
breadth-first search on a full binary tree formed from a fragmented molecular
graph. Systematic evaluations using JTVAE, BRICS, MMPA, and Scaffold show the
feasibility to construct a multilingual molecular description system, where
various descriptions complement each other, enhancing the overall performance.
Additionally, it exhibits impressive performance on low-resource datasets,
whether the model is original, data augmented, or pre-training fine-tuned. It
significantly outperforms classical SMILES, DeepSMILES, SELFIES and baseline
models in goal-directed tasks. Furthermore, it surpasses start-of-the-art
fragment, graph and SMILES based approaches on ChEMBL, Zinc, and QM9.
( 2
min )
We consider (nonparametric) sparse additive models (SpAM) for classification.
The design of a SpAM classifier is based on minimizing the logistic loss with a
sparse group Lasso/Slope-type penalties on the coefficients of univariate
additive components' expansions in orthonormal series (e.g., Fourier or
wavelets). The resulting classifier is inherently adaptive to the unknown
sparsity and smoothness. We show that under certain sparse group restricted
eigenvalue condition it is nearly-minimax (up to log-factors) simultaneously
across the entire range of analytic, Sobolev and Besov classes. The performance
of the proposed classifier is illustrated on a simulated and a real-data
examples.
( 2
min )
Recently proposed BERT-based evaluation metrics for text generation perform
well on standard benchmarks but are vulnerable to adversarial attacks, e.g.,
relating to information correctness. We argue that this stems (in part) from
the fact that they are models of semantic similarity. In contrast, we develop
evaluation metrics based on Natural Language Inference (NLI), which we deem a
more appropriate modeling. We design a preference-based adversarial attack
framework and show that our NLI based metrics are much more robust to the
attacks than the recent BERT-based metrics. On standard benchmarks, our NLI
based metrics outperform existing summarization metrics, but perform below SOTA
MT metrics. However, when combining existing metrics with our NLI metrics, we
obtain both higher adversarial robustness (15%-30%) and higher quality metrics
as measured on standard benchmarks (+5% to 30%).
( 2
min )
The accuracy of tinyML applications is often affected by various
environmental factors, such as noises, location/calibration of sensors, and
time-related changes. This article introduces a neural network based on-device
learning (ODL) approach to address this issue by retraining in deployed
environments. Our approach relies on semi-supervised sequential training of
multiple neural networks tailored for low-end edge devices. This article
introduces its algorithm and implementation on wireless sensor nodes consisting
of a Raspberry Pi Pico and low-power wireless module. Experiments using
vibration patterns of rotating machines demonstrate that retraining by ODL
improves anomaly detection accuracy compared with a prediction-only deep neural
network in a noisy environment. The results also show that the ODL approach can
save communication cost and energy consumption for battery-powered Internet of
Things devices.
( 2
min )
Most fair machine learning methods either highly rely on the sensitive
information of the training samples or require a large modification on the
target models, which hinders their practical application. To address this
issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the
loss over the reweighted data set (second stage) where the sample weights are
computed to balance the model performance across different demographic groups
(first stage). FAIRIF can be applied on a wide range of models trained by
stochastic gradient descent without changing the model, while only requiring
group annotations on a small validation set to compute sample weights.
Theoretically, we show that, in the classification setting, three notions of
disparity among different groups can be mitigated by training with the weights.
Experiments on synthetic data sets demonstrate that FAIRIF yields models with
better fairness-utility trade-offs against various types of bias; and on
real-world data sets, we show the effectiveness and scalability of FAIRIF.
Moreover, as evidenced by the experiments with pretrained models, FAIRIF is
able to alleviate the unfairness issue of pretrained models without hurting
their performance.
( 3
min )
Molecular design based on generative models, such as variational autoencoders
(VAEs), has become increasingly popular in recent years due to its efficiency
for exploring high-dimensional molecular space to identify molecules with
desired properties. While the efficacy of the initial model strongly depends on
the training data, the sampling efficiency of the model for suggesting novel
molecules with enhanced properties can be further enhanced via latent space
optimization. In this paper, we propose a multi-objective latent space
optimization (LSO) method that can significantly enhance the performance of
generative molecular design (GMD). The proposed method adopts an iterative
weighted retraining approach, where the respective weights of the molecules in
the training data are determined by their Pareto efficiency. We demonstrate
that our multi-objective GMD LSO method can significantly improve the
performance of GMD for jointly optimizing multiple molecular properties.
( 2
min )
Intent Detection is one of the core tasks of dialog systems. Few-shot Intent
Detection is challenging due to limited number of annotated utterances for
novel classes. Generalized Few-shot intent detection is more realistic but
challenging setup which aims to discriminate the joint label space of both
novel intents which have few examples each and existing intents consisting of
enough labeled data. Large label spaces and fewer number of shots increase the
complexity of the task. In this work, we employ a simple and effective method
based on Natural Language Inference that leverages the semantics in the
class-label names to learn and predict the novel classes. Our method achieves
state-of-the-art results on 1-shot and 5-shot intent detection task with gains
ranging from 2-8\% points in F1 score on four benchmark datasets. Our method
also outperforms existing approaches on a more practical setting of generalized
few-shot intent detection with gains up to 20% F1 score. We show that the
suggested approach performs well across single and multi domain datasets with
the number of class labels from as few as 7 to as high as 150.
( 2
min )
The modeling and control of complex physical systems are essential in
real-world problems. We propose a novel framework that is generally applicable
to solving PDE-constrained optimal control problems by introducing surrogate
models for PDE solution operators with special regularizers. The procedure of
the proposed framework is divided into two phases: solution operator learning
for PDE constraints (Phase 1) and searching for optimal control (Phase 2). Once
the surrogate model is trained in Phase 1, the optimal control can be inferred
in Phase 2 without intensive computations. Our framework can be applied to both
data-driven and data-free cases. We demonstrate the successful application of
our method to various optimal control problems for different control variables
with diverse PDE constraints from the Poisson equation to Burgers' equation.
( 2
min )
Reinforcement learning has been used to train policies that outperform even
the best human players in various games. However, a large amount of data is
needed to achieve good performance, which in turn requires building large-scale
frameworks and simulators. In this paper, we study how large-scale
reinforcement learning can be applied to autonomous driving, analyze how the
resulting policies perform as the experiment size is scaled, and what the most
important factors contributing to policy performance are. To do this, we first
introduce a hardware-accelerated autonomous driving simulator, which allows us
to efficiently collect experience from billions of agent steps. This simulator
is paired with a large-scale, multi-GPU reinforcement learning framework. We
demonstrate that simultaneous scaling of dataset size, model size, and agent
steps trained provides increasingly strong driving policies in regard to
collision, traffic rule violations, and progress. In particular, our best
policy reduces the failure rate by 57% while improving progress by 23% compared
to the current state-of-the-art machine learning policies for autonomous
driving.
( 2
min )
Resistor networks have recently had a surge of interest as substrates for
energy-efficient self-learning machines. This work studies the computational
capabilities of these resistor networks. We show that electrical networks
composed of voltage sources, linear resistors, diodes and voltage-controlled
voltage sources (VCVS) can implement any continuous functions. To prove it, we
assume that the circuit elements are ideal and that the conductances of
variable resistors and the amplification factors of the VCVS's can take
arbitrary values -- arbitrarily small or arbitrarily large. The constructive
nature of our proof could also inform the design of such self-learning
electrical networks.
( 2
min )
The integration of different imaging modalities, such as structural,
diffusion tensor, and functional magnetic resonance imaging, with deep learning
models has yielded promising outcomes in discerning phenotypic characteristics
and enhancing disease diagnosis. The development of such a technique hinges on
the efficient fusion of heterogeneous multimodal features, which initially
reside within distinct representation spaces. Naively fusing the multimodal
features does not adequately capture the complementary information and could
even produce redundancy. In this work, we present a novel joint self-supervised
and supervised contrastive learning method to learn the robust latent feature
representation from multimodal MRI data, allowing the projection of
heterogeneous features into a shared common space, and thereby amalgamating
both complementary and analogous information across various modalities and
among similar subjects. We performed a comparative analysis between our
proposed method and alternative deep multimodal learning approaches. Through
extensive experiments on two independent datasets, the results demonstrated
that our method is significantly superior to several other deep multimodal
learning methods in predicting abnormal neurodevelopment. Our method has the
capability to facilitate computer-aided diagnosis within clinical practice,
harnessing the power of multimodal data.
( 2
min )
Model stores offer third-party ML models and datasets for easy project
integration, minimizing coding efforts. One might hope to find detailed
specifications of these models and datasets in the documentation, leveraging
documentation standards such as model and dataset cards. In this study, we use
statistical analysis and hybrid card sorting to assess the state of the
practice of documenting model cards and dataset cards in one of the largest
model stores in use today--Hugging Face (HF). Our findings show that only
21,902 models (39.62\%) and 1,925 datasets (28.48\%) have documentation.
Furthermore, we observe inconsistency in ethics and transparency-related
documentation for ML models and datasets.
( 2
min )
We propose an adaptive model-predictive controller that balances driving the
system to a goal state and seeking system observations that are informative
with respect to the parameters of a nonlinear autoregressive exogenous model.
The controller's objective function is derived from an expected free energy
functional and contains information-theoretic terms expressing uncertainty over
model parameters and output predictions. Experiments illustrate how parameter
uncertainty affects the control objective and evaluate the proposed controller
for a pendulum swing-up task.
( 2
min )
Numerous regularization methods for deformable image registration aim at
enforcing smooth transformations, but are difficult to tune-in a priori and
lack a clear physical basis. Physically inspired strategies have emerged,
offering a sound theoretical basis, but still necessitating complex
discretization and resolution schemes. This study introduces a regularization
strategy that does not require discretization, making it compatible with
current registration frameworks, while retaining the benefits of physically
motivated regularization for medical image registration. The proposed method
performs favorably in both synthetic and real datasets, exhibiting an accuracy
comparable to current state-of-the-art methods.
( 2
min )
Tensorial neural networks (TNNs) combine the successes of multilinear algebra
with those of deep learning to enable extremely efficient reduced-order models
of high-dimensional problems. Here, I describe a deep neural network
architecture that fuses multiple TNNs into a larger network, intended to solve
a broader class of problems than a single TNN. I evaluate this architecture,
referred to as a "stacked tensorial neural network" (STNN), on a parametric PDE
with three independent variables and three parameters. The three parameters
correspond to one PDE coefficient and two quantities describing the domain
geometry. The STNN provides an accurate reduced-order description of the
solution manifold over a wide range of parameters. There is also evidence of
meaningful generalization to parameter values outside its training data.
Finally, while the STNN architecture is relatively simple and problem agnostic,
it can be regularized to incorporate problem-specific features like symmetries
and physical modeling assumptions.
( 2
min )
In this paper we define a population parameter, ``Generalized Variable
Importance Metric (GVIM)'', to measure importance of predictors for black box
machine learning methods, where the importance is not represented by
model-based parameter. GVIM is defined for each input variable, using the true
conditional expectation function, and it measures the variable's importance in
affecting a continuous or a binary response. We extend previously published
results to show that the defined GVIM can be represented as a function of the
Conditional Average Treatment Effect (CATE) for any kind of a predictor, which
gives it a causal interpretation and further justification as an alternative to
classical measures of significance that are only available in simple parametric
models. Extensive set of simulations using realistically complex relationships
between covariates and outcomes and number of regression techniques of varying
degree of complexity show the performance of our proposed estimator of the
GVIM.
( 2
min )
Most fair machine learning methods either highly rely on the sensitive
information of the training samples or require a large modification on the
target models, which hinders their practical application. To address this
issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the
loss over the reweighted data set (second stage) where the sample weights are
computed to balance the model performance across different demographic groups
(first stage). FAIRIF can be applied on a wide range of models trained by
stochastic gradient descent without changing the model, while only requiring
group annotations on a small validation set to compute sample weights.
Theoretically, we show that, in the classification setting, three notions of
disparity among different groups can be mitigated by training with the weights.
Experiments on synthetic data sets demonstrate that FAIRIF yields models with
better fairness-utility trade-offs against various types of bias; and on
real-world data sets, we show the effectiveness and scalability of FAIRIF.
Moreover, as evidenced by the experiments with pretrained models, FAIRIF is
able to alleviate the unfairness issue of pretrained models without hurting
their performance.
( 3
min )
In stochastic zeroth-order optimization, a problem of practical relevance is
understanding how to fully exploit the local geometry of the underlying
objective function. We consider a fundamental setting in which the objective
function is quadratic, and provide the first tight characterization of the
optimal Hessian-dependent sample complexity. Our contribution is twofold.
First, from an information-theoretic point of view, we prove tight lower bounds
on Hessian-dependent complexities by introducing a concept called energy
allocation, which captures the interaction between the searching algorithm and
the geometry of objective functions. A matching upper bound is obtained by
solving the optimal energy spectrum. Then, algorithmically, we show the
existence of a Hessian-independent algorithm that universally achieves the
asymptotic optimal sample complexities for all Hessian instances. The optimal
sample complexities achieved by our algorithm remain valid for heavy-tailed
noise distributions, which are enabled by a truncation method.
( 2
min )
We introduce a pivot for exact selective inference with randomization. Not
only does our pivot lead to exact inference in Gaussian regression models, but
it is also available in closed form. We reduce the problem of exact selective
inference to a bivariate truncated Gaussian distribution. By doing so, we give
up some power that is achieved with approximate maximum likelihood estimation
in Panigrahi and Taylor (2022). Yet our pivot always produces narrower
confidence intervals than a closely related data splitting procedure. We
investigate the trade-off between power and exact selective inference on
simulated datasets and an HIV drug resistance dataset.
( 2
min )
The Implicitly Normalized Forecaster (INF) algorithm is considered to be an
optimal solution for adversarial multi-armed bandit (MAB) problems. However,
most of the existing complexity results for INF rely on restrictive
assumptions, such as bounded rewards. Recently, a related algorithm was
proposed that works for both adversarial and stochastic heavy-tailed MAB
settings. However, this algorithm fails to fully exploit the available data.
In this paper, we propose a new version of INF called the Implicitly
Normalized Forecaster with clipping (INF-clip) for MAB problems with
heavy-tailed reward distributions. We establish convergence results under mild
assumptions on the rewards distribution and demonstrate that INF-clip is
optimal for linear heavy-tailed stochastic MAB problems and works well for
non-linear ones. Furthermore, we show that INF-clip outperforms the
best-of-both-worlds algorithm in cases where it is difficult to distinguish
between different arms.
( 2
min )
Arunachalam and de Wolf (2018) showed that the sample complexity of quantum
batch learning of boolean functions, in the realizable and agnostic settings,
has the same form and order as the corresponding classical sample complexities.
In this paper, we extend this, ostensibly surprising, message to batch
multiclass learning, online boolean learning, and online multiclass learning.
For our online learning results, we first consider an adaptive adversary
variant of the classical model of Dawid and Tewari (2022). Then, we introduce
the first (to the best of our knowledge) model of online learning with quantum
examples.
( 2
min )
We introduce a new approach for generating sequences of implied volatility
(IV) surfaces across multiple assets that is faithful to historical prices. We
do so using a combination of functional data analysis and neural stochastic
differential equations (SDEs) combined with a probability integral transform
penalty to reduce model misspecification. We demonstrate that learning the
joint dynamics of IV surfaces and prices produces market scenarios that are
consistent with historical features and lie within the sub-manifold of surfaces
that are essentially free of static arbitrage. Finally, we demonstrate that
delta hedging using the simulated surfaces generates profit and loss (P&L)
distributions that are consistent with realised P&Ls.
( 2
min )
We consider the problem of sufficient dimension reduction (SDR) for
multi-index models. The estimators of the central mean subspace in prior works
either have slow (non-parametric) convergence rates, or rely on stringent
distributional conditions (e.g., the covariate distribution $P_{\mathbf{X}}$
being elliptical symmetric). In this paper, we show that a fast parametric
convergence rate of form $C_d \cdot n^{-1/2}$ is achievable via estimating the
\emph{expected smoothed gradient outer product}, for a general class of
distribution $P_{\mathbf{X}}$ admitting Gaussian or heavier distributions. When
the link function is a polynomial with a degree of at most $r$ and
$P_{\mathbf{X}}$ is the standard Gaussian, we show that the prefactor depends
on the ambient dimension $d$ as $C_d \propto d^r$.
( 2
min )
Unsupervised learning has become a staple in classical machine learning,
successfully identifying clustering patterns in data across a broad range of
domain applications. Surprisingly, despite its accuracy and elegant simplicity,
unsupervised learning has not been sufficiently exploited in the realm of
phylogenetic tree inference. The main reason for the delay in adoption of
unsupervised learning in phylogenetics is the lack of a meaningful, yet simple,
way of embedding phylogenetic trees into a vector space. Here, we propose the
simple yet powerful split-weight embedding which allows us to fit standard
clustering algorithms to the space of phylogenetic trees. We show that our
split-weight embedded clustering is able to recover meaningful evolutionary
relationships in simulated and real (Adansonia baobabs) data.
( 2
min )
Predicting audio quality in voice synthesis and conversion systems is a
critical yet challenging task, especially when traditional methods like Mean
Opinion Scores (MOS) are cumbersome to collect at scale. This paper addresses
the gap in efficient audio quality prediction, especially in low-resource
settings where extensive MOS data from large-scale listening tests may be
unavailable. We demonstrate that uncertainty measures derived from
out-of-the-box pretrained self-supervised learning (SSL) models, such as
wav2vec, correlate with MOS scores. These findings are based on data from the
2022 and 2023 VoiceMOS challenges. We explore the extent of this correlation
across different models and language contexts, revealing insights into how
inherent uncertainties in SSL models can serve as effective proxies for audio
quality assessment. In particular, we show that the contrastive wav2vec models
are the most performant in all settings.
( 2
min )
Deep Neural Networks (DNNs) are powerful tools for various computer vision
tasks, yet they often struggle with reliable uncertainty quantification - a
critical requirement for real-world applications. Bayesian Neural Networks
(BNN) are equipped for uncertainty estimation but cannot scale to large DNNs
that are highly unstable to train. To address this challenge, we introduce the
Adaptable Bayesian Neural Network (ABNN), a simple and scalable strategy to
seamlessly transform DNNs into BNNs in a post-hoc manner with minimal
computational and training overheads. ABNN preserves the main predictive
properties of DNNs while enhancing their uncertainty quantification abilities
through simple BNN adaptation layers (attached to normalization layers) and a
few fine-tuning steps on pre-trained models. We conduct extensive experiments
across multiple datasets for image classification and semantic segmentation
tasks, and our results demonstrate that ABNN achieves state-of-the-art
performance without the computational budget typically associated with ensemble
methods.
( 2
min )
We propose an adaptive model-predictive controller that balances driving the
system to a goal state and seeking system observations that are informative
with respect to the parameters of a nonlinear autoregressive exogenous model.
The controller's objective function is derived from an expected free energy
functional and contains information-theoretic terms expressing uncertainty over
model parameters and output predictions. Experiments illustrate how parameter
uncertainty affects the control objective and evaluate the proposed controller
for a pendulum swing-up task.
( 2
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )